seanjensengrey / boilerpipe

Automatically exported from code.google.com/p/boilerpipe
0 stars 0 forks source link

UTF characters are not handled correctly #28

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
The following test case fails:

ArticleExtractor extractor = ArticleExtractor.INSTANCE;
TextDocument textDoc = new BoilerpipeSAXInput(HTMLFetcher.fetch(new 
URL("http://de.wikipedia.org/wiki/Barack_Obama")).toInputSource()).getTextDocume
nt();
assertEquals("Barack Obama – Wikipedia", textDoc.getTitle());

The attached patch fixes the issue.

Original issue reported on code.google.com by florian....@gmail.com on 26 Jul 2011 at 7:13

Attachments:

GoogleCodeExporter commented 9 years ago
I can't trigger the error with the trunk version of boilerpipe.

Could you please re-test?

Original comment by ckkohl79 on 22 Jan 2012 at 11:11

GoogleCodeExporter commented 9 years ago
No response.

Original comment by ckkohl79 on 21 Mar 2012 at 9:27

GoogleCodeExporter commented 9 years ago
No response.

Original comment by ckkohl79 on 21 Mar 2012 at 9:27