jokiazhang / metadata-extractor

Automatically exported from code.google.com/p/metadata-extractor
0 stars 0 forks source link

Xerces 2.8.1 hangs on malformed HTML files under Apache Tika #85

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
This is not a direct problem with the metadata-extractor, but for the Apache 
Tika project. As outlined in https://issues.apache.org/jira/browse/TIKA-1154, 
Tika uses version 2.8.1 of Xerces, as that is what the metadata extractor 
requires, but that old version hangs on malformed HTML files.

This issue appears to have been fixed in later versions of Xerces (2.10.0 
onwards), but we don't know how upgrading Xerces will affect the 
metadata-extractor. Could you consider upgrading Xerces to a more recent 
version?

Thank you.
Andy Jackson

Original issue reported on code.google.com by anjack...@gmail.com on 25 Jul 2013 at 1:50