Data run calculations are wrong?

hongyihu commented 9 years ago

Hi,

I contacted the Sleuthkit developer's list about this a while back, but didn't get a response. Sometimes analyzeMFT and Sleuthkit calculate different data runs for the same file. In these cases, Sleuthkit looks to be correct and analyzeMFT's results are off (I can also confirm by checking those data runs on disk). However analyzeMFT's method complies with all of the NTFS documentation I can find on data runs whereas Sleuthkit somehow gets different numbers.

I hope that you might be able to shed some light on this, as it appears to be a bug in analyzeMFT. Here's an example from the email I sent.

I've attached an odd example of a raw MFT entry (of a zip file) from my clean disk image. I also included the hex dump which includes my math and notes. Github is not allowing me to upload non-picture files, so I'll try to include them in my follow up.

I'm perplexed as to how TSK is parsing the data runs.

The data run snippet is :

31 01 4c 6c 05 21 03 71 01 31 16 be 31 fd 03 00 94 15 01 31 6f 9a 7c ff 31 27 04 bc 0d 31 4f 71 44 01 00 f5 80 00 00 00 00 80 00 00(End)

But TSK is interpreting the data runs as

31 01 4c 6c 05 21 03 71 01 31 16 be 31 fd 03 00 94 15 01 31 6f 9a 7c ff 31 27 04 bc 0d 31 4f 71 44 01 00 (End)

TSK seems to be right, but I don't understand what it's doing.

My analysis by hand (which is the same as what analyzeMFT gives me and consistent with all the NTFS documentation I could find) gives me the following runs. The first three are normal — I get the same result as TSK. The last few are divergent.

31 01 4c 6c 05 (normal) len 0x01 offset 0x056c4c ==355404 Cluster Address == 355404

21 03 71 01 (normal) len 0x03 offset 0x0171 == 369 Cluster Address == 355404 + 369 == 355773

31 16 be 31 fd (normal) len 0x16 (22) offset 0xfd31be == -183874 Cluster Address == 171899

Here's where I'm confused:

03 00 94 15 (sparse) The header gives me a 0 byte offset field and a 3 byte length field. 0 byte offset field means a sparse data run (so these runs don't take up disk space and return 0s when read) 3 byte length field gives me a length of 0x159400 == 1414144

01 31 (sparse) 0 byte offset field 1 byte length field == length 0x31

6f 9a 7c ff 31 27 04 bc 0d 31 4f 71 44 01 00 f5 80 00 00 00 00 80 00 Something is clearly wrong here.

TSK gives me something more reasonable:

[Len: 1, Addr: 355404], [Len: 3, Addr: 355773], [Len: 22, Addr: 171899], [Len: 39, Addr: 242959], [Len: 111, Addr: 209321], [Len: 39, Addr: 1109421], [Len: 79, Addr: 1192478],

The first three runs are the same, but the rest are different. TSK seems to interpret the runs like this:

31 01 4c 6c 05 21 03 71 01 31 16 be 31 fd 03 00 94 15 01 31 6f 9a 7c ff 31 27 04 bc 0d 31 4f 71 44 01 00 (End)

This only makes sense to me if the fourth line were 31 27 94 15 01 instead of 03 00 94 15 01. Then TSK's numbers and parsing check out with the raw run list. I believe that TSK is correct, but I don't understand how it is parsing the data runs here.

hongyihu commented 9 years ago

Formatting is a pain. I can email you both the raw and the hexdump (annotated) of the particular MFT entry if you want to look at it.

dkovar commented 9 years ago

Formatting is indeed a pain. Thank you very much for your detailed analysis of the issue! I've not looked at this stuff in months so you're way ahead of me at the moment. I'll try to get to this over the weekend and follow up with you then.

Again, thank you very much for the effort. Much appreciated.

dkovar commented 9 years ago

Aye, email to dkovar at gmail would probably be helpful. Thanks!

Hexadite-Shlomi commented 8 years ago

Hello. was this issue ever resolved?

hongyihu commented 8 years ago

Unfortunately not to my knowledge. I spoke with a TSK contributor a while back, and he thinks the issue is probably with compressed files.

msuhanov commented 3 years ago

I found a raw MFT entry attached to the original discussion on the mailing list ("mft2_16793.raw"). The issue is that the update sequence array isn't applied when trying to decode mapping pairs.

Before the array is applied, the bytes in question are: 000001f0 31 01 4c 6c 05 21 03 71 01 31 16 be 31 fd 03 00 |1.Ll.!.q.1..1...| 00000200 94 15 01 31 6f 9a 7c ff 31 27 04 bc 0d 31 4f 71 |...1o.|.1'...1Oq|

After the array is applied (03 00 -> 31 27): 000001f0 31 01 4c 6c 05 21 03 71 01 31 16 be 31 fd 31 27 |1.Ll.!.q.1..1.1'| 00000200 94 15 01 31 6f 9a 7c ff 31 27 04 bc 0d 31 4f 71 |...1o.|.1'...1Oq|

rowingdude / analyzeMFT

Data run calculations are wrong? #21