jungjonghun / crawler4j

Automatically exported from code.google.com/p/crawler4j
0 stars 0 forks source link

HTML parser does not delimit words by html element #186

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Start a crawler with some web site which contains on some of pages string 
like "spłaty pożyczki</option><option 
value="/frequently-asked-questions?item=question_19">Czy"
2. Print out html text in WebCrawler.visit:
System.out.println(text)
3. In log one will see 
spłaty pożyczkiCzy

What is the expected output? What do you see instead?
spłaty pożyczki Czy

What version of the product are you using?
3.4

Original issue reported on code.google.com by Bogdan.A...@gmail.com on 20 Jan 2013 at 10:33

GoogleCodeExporter commented 9 years ago

Original comment by avrah...@gmail.com on 18 Aug 2014 at 3:31