Open h920526 opened 11 months ago
This is "as designed" currently - wholeText gets only the non-normalized text values from the elements.
I have considered changing it to emit a newline when encountering a new block tag as that seems more useful.
text()
will give you normalized text with a (space, not newline) between the nodes. That's designed for e.g. indexing / searching / extracting.
Would be good to hear opinions from folks on this. It seems safe and information preserving.
use br Tag
I tried to use wholeText() as a way to convert html to text, but it doesn't really work... \n are not ignored (they should be) and after that whole text had some weird identation...
and text() is even worse...
Is there any other command that could be used to convert html content into text that produces better results?
@h920526 For your case I think you need to wrap your text into html tags, I needed to do that, so something like this:
<html><body><div><p>Hello</p><p>World</p></div></body></html>
@jhy It might be useful to have command so that it can be converted to text. At the moment wholeText does this, but there are problems, see 1st message.
Hi team,
Jsoup v1.16.1
after calling wholeText()
expected: Hello World
but actual: HelloWorld
does not wrap with new line thanks