Xerces 2.8.1 hangs on malformed HTML files under Apache Tika

This is not a direct problem with the metadata-extractor, but for the Apache 
Tika project. As outlined in https://issues.apache.org/jira/browse/TIKA-1154, 
Tika uses version 2.8.1 of Xerces, as that is what the metadata extractor 
requires, but that old version hangs on malformed HTML files.

This issue appears to have been fixed in later versions of Xerces (2.10.0 
onwards), but we don't know how upgrading Xerces will affect the 
metadata-extractor. Could you consider upgrading Xerces to a more recent 
version?

Thank you.
Andy Jackson

Original issue reported on code.google.com by anjack...@gmail.com on 25 Jul 2013 at 1:50

pravinkosurkar / metadata-extractor

Xerces 2.8.1 hangs on malformed HTML files under Apache Tika #85