HtmlParseData.getText() doesn't recognize breaks or paragraphs

xrma / crawler4j

Automatically exported from code.google.com/p/crawler4j

0 stars 0 forks source link

HtmlParseData.getText() doesn't recognize breaks or paragraphs #259

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. parse any page with <P> or <BR> in it
2. getText()
3.

What is the expected output? What do you see instead?
If the original page was like this:

one<br/>
<p>two</p>
three

I would expect one/ntwo/nthree
and instead I see "onetwothree"

What version of the product are you using?

Please provide any additional information below.

Original issue reported on code.google.com by jwindb...@gmail.com on 8 Apr 2014 at 10:39

GoogleCodeExporter commented 9 years ago

Original comment by avrah...@gmail.com on 18 Aug 2014 at 3:50

Changed state: Accepted
Added labels: Priority-High
Removed labels: Priority-Medium