google / budoux

https://google.github.io/budoux/
Apache License 2.0
1.44k stars 32 forks source link

[Java] Skip node at the end of input #765

Closed tushuhei closed 1 week ago

tushuhei commented 1 week ago

Current Java code throws the following error when the input HTML has a skip node at the end. This PR fixes the issue.

java.lang.StringIndexOutOfBoundsException: index 15,length 15
        at java.base/java.lang.String.checkIndex(String.java:3278)
        at java.base/java.lang.StringUTF16.checkIndex(StringUTF16.java:1470)
        at java.base/java.lang.StringUTF16.charAt(StringUTF16.java:1267)
        at java.base/java.lang.String.charAt(String.java:695)
        at com.google.budoux.HTMLProcessor$PhraseResolvingNodeVisitor.head(HTMLProcessor.java:133)
        at org.jsoup.select.NodeTraversor.traverse(NodeTraversor.java:34)
        at org.jsoup.nodes.Node.traverse(Node.java:707)
        at org.jsoup.nodes.Element.traverse(Element.java:1883)
        at com.google.budoux.HTMLProcessor.resolve(HTMLProcessor.java:194)