This is not a direct problem with the metadata-extractor, but for the Apache
Tika project. As outlined in https://issues.apache.org/jira/browse/TIKA-1154,
Tika uses version 2.8.1 of Xerces, as that is what the metadata extractor
requires, but that old version hangs on malformed HTML files.
This issue appears to have been fixed in later versions of Xerces (2.10.0
onwards), but we don't know how upgrading Xerces will affect the
metadata-extractor. Could you consider upgrading Xerces to a more recent
version?
Thank you.
Andy Jackson
Original issue reported on code.google.com by anjack...@gmail.com on 25 Jul 2013 at 1:50
Original issue reported on code.google.com by
anjack...@gmail.com
on 25 Jul 2013 at 1:50