sepinf-inc / IPED

IPED Digital Forensic Tool. It is an open source software that can be used to process and analyze digital evidence, often seized at crime scenes by law enforcement or in a corporate investigation by private examiners.
Other
953 stars 218 forks source link

Carver and file signature for Shareaza download control files (.sd) #1816

Closed wladimirleite closed 1 year ago

wladimirleite commented 1 year ago

While working on a CSAM case with very little concrete evidence, I used an external carver and a parser written by @felipecampanini, who is working on another similar case, and found very important Shareaza download control files (SDL). A really great "byte-level" analysis by @felipecampanini!

These SDL files have information about ongoing Shareaza downloads. In cases that the "main" Shareaza library files (like "Liibrary1.dat", already parsed by IPED) were lost, these SDLs can be very important.

Another observation is that these SDL files are "one per download file", so they are small, while "Library1.dat" contains all the library entries in a single file. In my case, all SDLs were recovered from file slacks. The suspect wiped the unallocated space, but the tool he used didn't seem to clean the slacks.

This issue is about the carver of these "Shareaza Downloads".  I believe @felipecampanini is still working on the standalone parser we are using for now. When it is finalized, we can create another issue to implement an IPED parser.

lfcnassif commented 1 year ago

Awesome work @felipecampanini and @tc-wleite! I don't remember well if it is Shareaza behavior, but in the past there were some download/sharing files app that used to put download info at the end of the file being downloaded and, at the end, the file used to be truncated to its size, so the download info used to be left in the file slack.

wladimirleite commented 1 year ago

@lfcnassif, a question about the current carving implementation for types that have only the header definition and a max length (i.e. no footer and no length information inside the file). Suppose the header signature is xyz and maxLength is 12. If we have the following sequence of bytes,

MNOPxyz0123xyzABCDEFGHIJKLM
    |----------|
    xyz0123xyzAB  <-- First carved item
           |----------|
           xyzABCDEFGHI  <-- Second carved item

Two entries will be carved, as expected, but the first one will contain the beginning of the second one, which I don't think would be the best guess. To avoid missing part of the carved item, maxLength has to be set to a relatively large value (for the given file type). And the larger maxLength is, the higher are the chances this situation happens.

I changed (locally to test, didn't commit) this behavior (in AbstractCarver) to limit the length of these carved items (with maxLength only), so they end when a new hit is found (for the save carver type). In this example, the first carved item would be smaller, stopping when the second item starts.

MNOPxyz0123xyzABCDEFGHIJKLM
    |-----|
    xyz0123  <-- First carved item
           |----------|
           xyzABCDEFGHI  <-- Second carved item

If I didn't miss anything, currently only ARES MOV and FLV carvers use the default carver, with header and maximum length only. For containers (like a ZIP), the current behavior would make sense (a ZIP inside another). But for types that the carved item end is not clear, I think the second behavior would make more sense.

@lfcnassif, is there a reason to keep only the first behavior? What do you think about changing to second? Another option would be to have an explicit parameter (in the carver configuration) to choose between these two behaviors.

lfcnassif commented 1 year ago

@lfcnassif, is there a reason to keep only the first behavior? What do you think about changing to second? Another option would be to have an explicit parameter (in the carver configuration) to choose between these two behaviors.

Hi @tc-wleite. Honestly I don't remember well if there was a specific reason, I implemented the carving algorithm years ago, I think the idea was to carve as much info as possible, not sure...

Not only ordinary containers like ZIP can have multiple headers of embedded items, but non common "containers" could also have, for instance, JPEG usually has thumbnails with multiple resolutions into Exif data.

I think above may also happen with files without a clear footer. Of course, this is file type specific and the new proposed approach seems useful. This would be a sensible change, ideally needing to be tested against a reasonable set of images...

Since I think the current behavior can be useful depending on file type and the change may cause some regression, my vote is to have a configurable parameter.

A somewhat related old idea was to have a configurable parameter to recover files which have a footer in the configuration even if the footer wasn't found in the max range specified. This would result in more carved files, can be good, but also in more garbage, of course bad...

lfcnassif commented 1 year ago

Just to be clear, my opinion can be wrong, just testing would tell us what is the best default behavior, but a configurable parameter seems interesting...

lfcnassif commented 1 year ago

MOV

I think MOV is from QuickTime family and is handled by the same carver that handles MP4, 3GP and QT movies.

wladimirleite commented 1 year ago

I think MOV is from QuickTime family and is handled by the same carver that handles MP4, 3GP and QT movies.

Sorry, MOV uses a custom carver indeed.

wladimirleite commented 1 year ago

I agree that the parameter would leave things more flexible and clearer. I will change a bit the code I written to keep the current behavior (as default) or not, depending on a parameter. What could be its name? breakOnNextHeader?

lfcnassif commented 1 year ago

What could be its name? breakOnNextHeader?

Fine to me! Or maybe stopOnNextHeader.

lfcnassif commented 1 year ago

Just to be clear, my opinion can be wrong, just testing would tell us what is the best default behavior, but a configurable parameter seems interesting...

Thinking better, possibly current FLV (and Shareaza) carving are merging 2 different files starting at the first header... For FLV this should not be hard to detect, since the video content, if playable, would change abruptly. Maybe we should run tests to evaluate this...

lfcnassif commented 1 year ago

For FLV this should not be hard to detect

Actually it would be easy, the number of FLV files smaller than max size would increase. Today we can already get some, if parent item ends before reaching max size.

wladimirleite commented 1 year ago

@lfcnassif, let me explain why I considered this other behavior (stopOnNextHeader). These SD files are relatively small, but their sizes vary a lot (I have samples here of active files ranging from 200 bytes to 60 KB). In the case when I found the carved files, with the current behavior, a single carved item sometimes contains 2-4 original SDs. The file names are stored in Unicode, so even without a parser IPED extracted them. And the result mixed different file names/paths, which is odd. Obviously, when the parser is integrated, that issue would be solved.

Another inconvenient is that from the carved file with multiple SDs, other SDs (after the first one) are carved. Something like slack >> carved-123.sd >> carved-23.sd >> carved-3.sd, which can be confusing.

EDIT: Sorry, I repeated the test with more attention and in fact what happens is that 3 items were created, like slack >> carved-123.sd, slack >> carved-23.sd and slack >> carved-3.sd, i.e. the 3rd original SD appeared 3 times in the carved content.

lfcnassif commented 1 year ago

Thanks @tc-wleite I got it. I realized different FLV files could be merged together (as you explained before).

Another inconvenient is that from the carved file with multiple SDs, other SDs (after the first one) are carved. Something like slack >> carved-123.sd >> carved-23.sd >> carved-3.sd, which can be confusing.

This is odd, because we have a rule to skip carving on carved files: https://github.com/sepinf-inc/IPED/blob/eee281883ba31a90a2b58edd90b6e03ab9ff63a4/iped-engine/src/main/java/iped/engine/task/carver/BaseCarveTask.java#L139-L145

wladimirleite commented 1 year ago

This is odd, because we have a rule to skip carving on carved files:

Hmmm, I thought we have that, but didn't try to find in the code to check. Let me check what is going on.

wladimirleite commented 1 year ago

I am sorry @lfcnassif , I repeated the test again (without stopOnNextHeader) and carving from carved items didn't happen. I edited my previous message.

lfcnassif commented 1 year ago

Don't worry @tc-wleite, thanks for testing again. After you finish the stopOnNextHeader option, I think we can test FLV carving at least with it and compare with current results, I can do that.