Closed chinwobble closed 6 years ago
Try printing out the parsed html. It may be wrapping the entire thing in a body of a new html
This is because there is a sneaky UTF-8 BOM at the start of the doc, which jsoup was reading but not skipping. So when it got to the parse tree it looked like a zero width no-break space char, which isn't whitespace (per the spec), so it was treated as character content, which meant the treebuilder went into the body phase.
Thanks for pointing out the issue. Glad that when you copied the HTML the BOM came along for the ride or we would have never found it :)
With JSoup 1.11.2, when I try to parse the below HTML I get zero elements.
Kotlin Code:
sample doc