openpreserve / format-corpus

An openly-licensed corpus of small example files, covering a wide range of formats and creation tools.
179 stars 39 forks source link

Mapping of files to PUIDs #19

Open dd388 opened 2 years ago

dd388 commented 2 years ago

Would it be useful for this repository to have something that maps every item in format-corpus to its respective PRONOM PUID? I am currently working on some comparison testing between file identification utilities and I've found this corpus to be helpful, though as it is there's no standard way of knowing any file's expected PUID. For my own testing, I've created a spreadsheet of items and my best guess for what the appropriate PUID is, but I'm not sure it's 100% accurate. It might be a start, though.

euanc commented 2 years ago

I'd love to see this.

There's also been a lot of progress with adding data to wikidata and enabling siegfried to identify wikidata IDs directly. If you had time to both try that out and add them that would be wonderful and I'm sure the siegfried folks would love any feedback that you might have as it's new functionality.

On Fri, 23 Jul 2021 at 10:33, Dianne Dietrich @.***> wrote:

Would it be useful for this repository to have something that maps every item in format-corpus to its respective PRONOM PUID? I am currently working on some comparison testing between file identification utilities and I've found this corpus to be helpful, though as it is there's no standard way of knowing any file's expected PUID. For my own testing, I've created a spreadsheet of items and my best guess for what the appropriate PUID is, but I'm not sure it's 100% accurate. It might be a start, though.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/openpreserve/format-corpus/issues/19, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABM66JXZKVGMBSU76QTJZTTZF4UPANCNFSM5A4D4JPA .