Closed nbxiglk0 closed 4 months ago
Well, note that getElementsMatchingText runs the regex against the parsed [text()](https://jsoup.org/apidocs/org/jsoup/nodes/Element.html#text()) of elements, not the original HTML source.
In your parsed DOM tree, you have element nodes (e.g. textarea
) which contain text nodes (e.g. test
). So there are no ><
characters to match in the >.*test.*<
regex.
Also note the difference between getElementsMatchingText
and getElementsMatchingOwnText
: the former uses text()
which includes textnodes of the element and its descendants; whilst the latter uses [ownText()
](https://jsoup.org/apidocs/org/jsoup/nodes/Element.html#ownText()) which includes only the element's directly owned textnode(s).
I would suggest doing something like:
String regex = ".*?test.*?";
String selector = String.format("textarea:matchesWholeOwnText(%s)", regex);
Elements els = doc.select(selector);
Or if you prefer:
String regex = ".*?test.*?";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Elements els = doc.getElementsMatchingOwnText(pattern);
els.forEach(element -> {
if (element.nameIs("textarea")) {
System.out.println("matched");
}
});
Both of those find the textarea
matching the corrected regex.
Hope this helps!
Hi, When i want to get Elements through regex pattern,The matching result is inconsistent with the expectation. for example, this is my test code
the html response is
i want get the textarea element by match
">.*test.*<"
,but i got nothing,Is there anything wrong with getElementsMatchingText method?