Format Identification for Digital Objects (FIDO) is a Python command-line tool to identify the file formats of digital objects. It is designed for simple integration into automated work-flows.
When updating signatures, if the format has a ReferenceFileIdentifier of type URL, we include a reference to it, including fetching it and calculating a checksum. However, ReferenceFileIdentifier is not consistent in its meaning or format.
Eg from PRONOM 88 where fmt/11 starts with a www, and the URL is actually a PNG
When updating signatures, if the format has a ReferenceFileIdentifier of type URL, we include a reference to it, including fetching it and calculating a checksum. However, ReferenceFileIdentifier is not consistent in its meaning or format.
Eg from PRONOM 88 where fmt/11 starts with a www, and the URL is actually a PNG
compared to fmt/569, which starts with http:// and is a HTML page linking to examples
When parsing it, we prepend http:// and fetch it, which breaks with
http://www.matroska.org/downloads/test_w1.html
Options include removing the examples and checksums from formats-v##.xml, or adding error handling around that section.