Closed atomotic closed 11 years ago
This has probably to do with the fact there is not a signature available yet for epub.
Looking at the container signature for fmt/61 this is probably because this particular signature has the same bytes on certain positions that are also in your epub files.
Could you please send or attach a few epub files so I can take a look at them and possibly create a signature for them?
There's an ePub signature here:
<mime-type type="application/epub+zip"> <acronym>EPUB</acronym> <_comment>Electronic Publication</_comment> <magic priority="50"> <match value="PK\003\004" type="string" offset="0"> <match value="mimetypeapplication/epub+zip" type="string" offset="30"/> </match> </magic> <glob pattern="*.epub"/> </mime-type>
Thanks, will add this to the extension xml file.
I guess you may have to set it up so that this takes precedence over the ZIP signature.
Note that the above signature is consistent with the proposed 'file magic' given in this section of the ePub spec.
When added to extensions.xml it has precedence over PRONOM signatures.
Thanks for the link to the spec.
I guess that the file magic in the epub spec is just too weak to be that useful for identification in a broader context. The test for epub should be strengthened similar to the tests for ooxml, odf, jar or any of the many formats that are also based on zip.
Cheers,
Adam.
From: Andy Jackson [mailto:notifications@github.com] Sent: 29 June 2013 12:44 To: openplanets/fido Subject: Re: [fido] epub recognized as xls (#32)
I guess you may have to set it up so that this takes precedence over the ZIP signature.
Note that the above signature is consistent with the proposed 'file magic' given in this section of the ePub spec. http://www.idpf.org/epub/30/spec/epub30-ocf.html#app-media-type
— Reply to this email directly or view it on GitHub https://github.com/openplanets/fido/issues/32#issuecomment-20228570 .
Adam Farquhar Head of Digital Scholarship Collections Division T:+44 (0)20 7412 7832
Adam.Farquhar@bl.uk The British Library London
NW1 2DB
http://www.bl.uk/ The British Library’s latest Annual Report and Accounts
http://www.bl.uk/aboutus/annrep/index.htmlhttp://www.bl.uk/knowledge
@adamfarquhar It's not that the ePub sig is not sensitive enough - there is no ePub signature in PRONOM.
Andy – Yes; I see that the tika signature is precise enough. I had scanned the xml too quickly. Perhaps the easiest fix then would be to get it added to pronom. Can you goose that along? It seems useful and not very controversial to add.
Cheers,
Adam.
From: Andy Jackson [mailto:notifications@github.com] Sent: 30 June 2013 14:35 To: openplanets/fido Cc: Farquhar, Adam Subject: Re: [fido] epub recognized as xls (#32)
@adamfarquhar https://github.com/adamfarquhar It's not that the ePub sig is not sensitive enough - there is no ePub signature in PRONOM http://www.nationalarchives.gov.uk/PRONOM/Format/proFormatSearch.aspx?status=detailReport&id=1270&strPageToDisplay=signatures .
— Reply to this email directly or view it on GitHub https://github.com/openplanets/fido/issues/32#issuecomment-20247206 .
Adam Farquhar Head of Digital Scholarship Collections Division T:+44 (0)20 7412 7832
Adam.Farquhar@bl.uk The British Library London
NW1 2DB
http://www.bl.uk/ The British Library’s latest Annual Report and Accounts
http://www.bl.uk/aboutus/annrep/index.htmlhttp://www.bl.uk/knowledge
I'll suggest it to David Clipsham. (done)
I think the problem is that fido uses DROID 4, with DROID 6.1 ePub is correctly recognized as "fmt/483".
Fido does not use DROID 4 - it doesn't use DROID at all. It uses the PRONOM database, which has this entry for ePub. That PRONOM entry only contains a file extension, which is how it identified your ePub file. PRONOM contains no internal 'magic number' signature for ePub, and so cannot identify ePub bytestreams without such contextual hints.
Hi All,
I added a PRONOM container signature as of 18/12/12, but container signatures will not work with DROID 4 (DROID 6 is the minimum). I'll add a binary variant in the next release for backward compatibility, which we aim to produce w/c 22 July in conjunction with the next DROID release (probably 6.1.3)
I have actually found this link: http://www.nationalarchives.gov.uk/PRONOM/fmt/483
The "container" method is used to recognize it, so it seems that fido as to be extended to read the container signature.
From the Source description in that page:
"This format can be identified via a container signature in DROID version 6 or later. The PRONOM database cannot currently represent container signatures."
Ah, my apologies, I missed the fact that there was a container signature. Fido only partially implements container signature support at present, which is why it doesn't work at the moment.
Hi,
We need this badly. Latest droid does not do the trick either so I worked around this by creating an extension:
<format>
<puid>fmt/483</puid>
<name>ePub format</name>
<version>1.0</version>
<alias>EPUB</alias>
<mime>application/epub+zip</mime>
<extension>epub</extension>
<has_priority_over>x-fmt/263</has_priority_over>
<has_priority_over>fmt/61</has_priority_over>
<signature>
<name>EPUB file</name>
<pattern>
<position>BOF</position>
<regex>(?s)\APK\x03\x04</regex>
</pattern>
<pattern>
<position>BOF</position>
<regex>(?s)\A.{30}mimetypeapplication/epub\+zip</regex>
</pattern>
</signature>
<details/>
</format>
Maybe this could be added to the fido_extensions.xml until the container signatures work properly in fido?
Hi All, thanks for the comments and suggestions.
@Kris-LIBIS: I will publish an update of fido_extensions.xml ASAP, for the time being you could add this ePUB sig to fido_extensions.xml.
And I will investigate why the container signature does not work properly.
The ePub signature has been added to fido_extensions.xml, the update has been pushed with the 1.1.6 release.
It seems like the container signature is alright but the precedence in the container signature file is set wrong. The addition of the format information to the extension file fixes this.
Please note FIDO will still report it is a match from the container signature file. Will investigate what is wrong with the container signature file and send this information to PRONOM.
Not closing this issue yet...
The bug submitted by @atomotic has been fixed, FIDO now correctly matches ePub files as container-type using the PRONOM container file. The fixed version is tagged and committed as version 1.1.8.
The bug of multiple matches was caused by the read_container() function matching only the first regex where it should have matched all regexes (applicable when the signature consists of more than one regex).
This fix has impact on matches of all signatures of the PRONOM container signature file, please check this if you rely on FIDO in a production environment.
The addtion of the ePub signature to the extension file has been commented out for the time being as this fix seems to tackle the issue.
Please report back if this fixes the issue for you.
Note that the read_container() function is not yet fully compatible with the container signature file and it does not handle them the way DROID does. It is still lacking matching on byte positions and is not yet able to parse OLE2 files the way it should be done.
Backward compatible versions of the signatures for ePub and Apple's iBooks were included in signature release v69, which become available on 19th July. This should assist users tied to older versions of DROID.
David
Hi Maurice,
Fido now correctly recognises the epubs. This did the trick.
Thanks.
Unfortunately a mime type is not included, but that's another problem.
Hi Kris,
Thanks for reporting back.
The mime type is not included because the
Will do. Next release will be mid-late September, but I'll ensure this is included.
David
Thanks!
I stated earlier the precedence for ePub was set wrong but it turned out that was not the case.
Bug is confirmed to be fixed, closing this issue.
tried with several epub files, same behaviour
$ ./fido.py ~/Downloads/Zizek\ -\ Vivere\ alla\ fine\ dei\ tempi.epub