google / budoux

https://google.github.io/budoux/
Apache License 2.0
1.44k stars 32 forks source link

[Java] Handle comment nodes #764

Closed tushuhei closed 1 week ago

tushuhei commented 1 week ago

Current Java code does not work with HTML strings that have comment nodes, throwing the following error.

java.util.NoSuchElementException
        at java.base/java.util.ArrayDeque.removeFirst(ArrayDeque.java:363)
        at java.base/java.util.ArrayDeque.pop(ArrayDeque.java:594)
        at com.google.budoux.HTMLProcessor$PhraseResolvingNodeVisitor.tail(HTMLProcessor.java:163)
        at org.jsoup.select.NodeTraversor.traverse(NodeTraversor.java:59)
        at org.jsoup.nodes.Node.traverse(Node.java:707)
        at org.jsoup.nodes.Element.traverse(Element.java:1883)
        at com.google.budoux.HTMLProcessor.resolve(HTMLProcessor.java:194)

This PR fixes this issue by ignoring comment nodes in the node visitor.