Open pyoio opened 10 years ago
Hmm, yes, there's an implementation difference in Impl
find, between "simple" selectors (matching [\\w-]+
) and non-simple ones:
@Override
public final Impl find(final String selector) {
// The * selector is evaluated using the standard DOM API
if ("*".equals(selector)) {
List<NodeList> result = new ArrayList<NodeList>();
for (Element element : elements) {
result.add(element.getElementsByTagName(selector));
}
return new Impl(document, namespaces, this).addNodeLists(result);
}
// Simple selectors are valid XML element names without namespaces. They
// are fetched using a namespace-stripping filter.
// [#107] Note, Element.getElementsByTagNameNS() cannot be used, as the
// underlying document may not be namespace-aware!
else if (SIMPLE_SELECTOR.matcher(selector).matches()) {
return find(JOOX.tag(selector, true));
}
// CSS selectors are transformed to XPath expressions
else {
return new Impl(document, namespaces, this).addElements(xpath(css2xpath(selector, isRoot())).get());
}
}
The difference is there for performance reasons, but it seems to produce different results, depending on the namespaces that are in use. I suspect the formally correct usage of jOOX with namespaces:
$(document)
.namespace("my-prefix", "...") // put your namespace URL here, as in xmlns="..."
.xpath("//my-prefix:results//my-prefix:episode//my-prefix:pid");
Namespaces currently seem not to be supported when using find()
and css selectors. This should be fixed.
I just ran into the same issue. What do you think about using the local name instead of the tag name, e.g."//*[local-name() = 'foo']" in CSS2XPath?
I have an issue as well. I try to get the links of all href using the following xpath String:
"//a[contains(@href, 'wiki/Mathe_f')]/@href/text()"
this returns an empty selection, while
"//a[contains(@href, 'wiki/Mathe_f')]"
returns all relevant "a" elements
@Geraldf: jOOX can only "Match
" XML elements, not attributes or text nodes, unfortunately. You could write this to get the same result, though:
$(xml).xpath("//a[contains(@href, 'wiki/Mathe_f')]").attr("href")
I have run into the same issue with using a default namespace. After checking the code I saw is used to create the document setNamespaceAware(true)
. Then I got the idea to pass the document to jOOX instead.
var domFactory = DocumentBuilderFactory.newInstance();
var builder = domFactory.newDocumentBuilder();
var document = builder.parse(new ByteArrayInputStream("""
<VAST version="4.2" xmlns="http://www.iab.com/VAST">
</VAST>
""".getBytes()));
var vast42version = $(document).xpath("/VAST").attr("version");
assertThat(vast42version).isEqualTo("4.2");
This worked for me but when performing modifications to the document the default builder is used again to build the code fragments. The modifications end up with an empty namespace attribute.
$(document).append("\n<Pricing model=\"cpm\" currency=\"USD\"><![CDATA[ 25.00 ]]></Pricing>\n");
var transformer = TransformerFactory.newInstance().newTransformer();
StringWriter writer = new StringWriter();
transformer.transform(new DOMSource(document), new StreamResult(writer));
assertThatXml(writer.toString())
.and("""
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<VAST xmlns="http://www.iab.com/VAST" version="4.2">
<Pricing currency="USD" model="cpm" xmlns=""><![CDATA[ 25.00 ]]></Pricing>
</VAST>
""")
.ignoreWhitespace()
.areIdentical();
What I believe may work but I have not tried yet is building a Document
and appending the Element
instead of a String.
Edit:
Ok even making the document and appending the elements ends up with an empty xmlns attribute. What worked for me was to use the default document made by jOOX. Then instead of modifying the xml with strings pass in the elements to jOOX. I took some code from jOOX and added the namespace to the wrapper document. This is modified from Util.createContent
.
public Element[] modifyContent(String content) {
String wrapped = "<dummy xmlns=\"http://www.iab.com/VAST\">" + content + "</dummy>";
Document parsed = null;
try {
parsed = JOOX.builder().parse(new InputSource(new StringReader(wrapped)));
} catch (SAXException | IOException e) {
return new Element[0];
}
DocumentFragment fragment = parsed.createDocumentFragment();
NodeList children = parsed.getDocumentElement().getChildNodes();
// appendChild removes children also from NodeList!
while (children.getLength() > 0) {
fragment.appendChild(children.item(0));
}
fragment = (DocumentFragment) document.importNode(fragment, true);
return JOOX.list(fragment.getChildNodes()).toArray(new Element[0]);
}
@moaxcp: I'm not sure if your comment is a question, or a bug report, or a feature request? In any case, to properly track things (as this issue has already been closed), can you please create a new issue? It may or may not be related to this one...
I've tried searching around for this and I've come to the conclusion I must be doing something crazy. I have the following XML:
Now when I do the following:
I get the expected result, a
Match
whose.text()
iswcrhl7thx1w
.However, when I do:
or
I get back an empty
Match
object. I've also tried//pid
,//results
and a variety of other xpath and nothing comes back. The only xpath I can get something back for is//*
.Is there something amiss in 1.2.0 or have I been looking at this too long and missed something?
p.s. Thank you for the library, I love it.