richardlehane / siegfried

signature-based file format identification
http://www.itforarchivists.com/siegfried
Apache License 2.0
224 stars 30 forks source link

epub and kindle files recognised as different formats #121

Closed workflowsguy closed 5 years ago

workflowsguy commented 6 years ago

While ".docx" is correctly recognized as "Microsoft Word for Windows" and not as ".zip", e.g. ".epub" is not. Also, the kindle ".azw3" format is not recognised as such, but as ".mobi" (which might technically be true, but the original file format nameis still correct).

Thanks!

richardlehane commented 6 years ago

Hi workflowsguy, I don't think there is a PRONOM ID for kindle .azw3 files & suspect that's why those files are defaulting to .mobi. You can make requests for new PRONOM IDs here: https://www.nationalarchives.gov.uk/contact-us/submit-information-for-pronom/.

There is an ID for .epub (fmt/483) & this should match. The test epub files I have access to do match successfully so this might be something specific to your epub files. Are you able to share a sample on this ticket so that I can take a look?

thanks for this report, Richard

workflowsguy commented 5 years ago

Hello Richard,

I got confused by my own workflow (and scripts) regarding epubs. They are all recognized correctly. Apologies for raising a false alarm.

Regarding the .azw(3) files, I will make a PRONOM request.

Thanks!

richardlehane commented 5 years ago

good to hear, thanks for letting me know - closing this now