google / turbinia

Automation and Scaling of Digital Forensics Tools
Apache License 2.0
730 stars 160 forks source link

Create new GCS -> Persistent Disk copy task #63

Open aarontp opened 7 years ago

aarontp commented 7 years ago

This will create a new Persistent disk with a filesystem that is slightly larger than the image file on GCS, and will copy the raw image from GCS directly as a file in the Persistent Disk FS. This will require a new Evidence type called something like PersistentDiskLocalImage.

alimez commented 4 years ago

Adding @sa3eed3ed He's working on cloudforensiclib that can help us with this issue.

sa3eed3ed commented 4 years ago

following up on our former discussion, https://github.com/google/cloud-forensics-utils/pull/169 is now merged. Raw disk images (+ other formats) can be imported from GCS to GCE. The created disk might be bigger due to some size restrictions in GCE, however the hash of the original disk matches the hash of the created GCE disk starting from the first byte and up to the byte count of the original disk. The byte count and the hash of the original disk are returned. Something similar to this can be done for verifying the evidence integrity, possible from the analysis VM: result['md5Hash'] = hash(created_gce_disk, start_byte=0, end_byte=result['bytes_count'])

More information: https://github.com/google/cloud-forensics-utils/blob/50396979a6e3e330fedb186a4b5942ce9dc0cff3/libcloudforensics/providers/gcp/forensics.py#L145

EX code:

import libcloudforensics.providers.gcp.forensics as forensics result = forensics.CreateDiskFromGCSImage( 'my-test-project-id', 'gs://evidense_images/folder/raw.dd', 'europe-west2-a', 'new-gce-disk')

result = {'project_id': 'my-test-project-id, 'disk_name': 'new-gce-disk', 'zone': 'europe-west2-a', 'bytes_count': '4294967296', 'md5Hash': 'f14c653659dcc646c720072fe0b682a9'}

aarontp commented 4 years ago

@sa3eed3ed That's awesome, thanks! I wonder if we should record the original size as metadata in the new disk somewhere. Should we create a method within libcloudforensics for verifying the hash given the size information? Mostly I want to make sure we have a documented way of verifying the hash somewhere. Thanks!

sa3eed3ed commented 4 years ago

@aarontp regarding hash verification, there are several ways to do it, but first a few points because I think we might need to do it in a different way:

aarontp commented 4 years ago

@sa3eed3ed Yeah, the process you outlined in the third bullet point is actually the way I was thinking we would need to do this originally, but if we have a forensically verifiable way to do it through another mechanism directly to a persistent disk, like you have, I'm fine with that too. I don't think we necessarily need to do the hash verification every time we process a disk, I think the most important thing is that we have a documented way to do so when needed. That being said, this is the first time we're directly changing the original evidence type before it starts to get processed, so it might be a nice to have.

I'm imagining that the best way to do this on the Turbinia side will be to have a Task that can process something like a GCSRawImage evidence type and process it via one of the two ways you mention above (either runs the libcloudforensics code you link above, or does the dd method). I think in either case we could just get the original size and hash[1] then, record that info into the second evidence object that gets created (GoogleCloudDisk or GoogleCloudDiskRawEmbedded).

If this is just a call into libcloudforensics that fully uses the API, another option could be to do this in turbiniactl before the request is even made (similar to how we copy disks from another project), but in general I'd like to have the actual processing code in the Tasks unless we need to act as the permissions of the end user (like we do for disk copies).

In summary, I think we could probably use either method here, but let me know if I'm missing some reason why you're suggesting the dd method instead.

[1] It looks like we can hash things on GCS easily enough with gsutil, though I'm not sure if this is done via an API, or if it just pulls back the entire fire and hashes it client side. https://cloud.google.com/storage/docs/gsutil/commands/hash

sa3eed3ed commented 4 years ago

@aarontp