tw4l / brunnhilde

Siegfried-based characterization tool for directories and disk images
MIT License
82 stars 11 forks source link

Feature: Report more accurate MAC dates #54

Open tw4l opened 2 years ago

tw4l commented 2 years ago

Connected to https://github.com/tw4l/brunnhilde/issues/53

Brunnhilde/Siegfried report on the file created and modified dates as they are in the file system where files are being scanned. Sometimes files contain more accurate timestamps within their internal metadata. If such dates are found, we should also report on or even prefer these dates, as they are likely to be more useful for an archivist.

kieranjol commented 2 years ago

I think that these embedded dates are indeed better than the file system dates, but there are some edge cases where they're just as misleading. I'm currently dealing a lot with the following scenario:

So perhaps just reporting all the dates is potentially the best way to go, and leave it up to the user to perform the detective work. This is why I think that the dateCreatedByApplication value in some PREMIS/METS files can't be automated too well.

tw4l commented 2 years ago

A related issue to consider with MAC dates is file timestamps not being preserved when files are carved from a disk image, either by tsk_recover or the UDF mount-and-copy routine.

kieranjol commented 1 year ago

I just encountered another relevant use case: ePADD can extract all email attachments from an email archive, for example, mbox. These attachments all have the date of extraction as their file system metadata, and a brunnhilde report on the attachments does not have the correct time span as a result. if it was possible for brunnhilde to be able to detect other types of embedded datetime values, it could provide more meaningful time spans.

I acknowledge that this would probably involve scanning the files with tika/mediainfo/exiftool etc as well so it's a huge task.