jOOQ / jOOX

jOOX - The Power of jQuery Applied to W3C DOM Like JDBC, DOM is a powerful, yet very verbose low-level API to manipulate XML. The HTML DOM an be manipulated with the popular jQuery product, in JavaScript. Why don't we have jQuery in Java? jOOX is jQuery's XML parts, applied to Java.
http://www.jooq.org/products
Apache License 2.0
495 stars 43 forks source link

xpath not working? #129

Open pyoio opened 10 years ago

pyoio commented 10 years ago

I've tried searching around for this and I've come to the conclusion I must be doing something crazy. I have the following XML:

<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="..."?>
<nitro xsi:schemaLocation="..." xmlns="..." xmlns:xsi="...">
    <results page="1" page_size="10" total="1" more_than="0">
        <episode>
            <pid>wcrhl7thx1w</pid>
        </episode>
    </results>
</nitro>

Now when I do the following:

$(document).find("results").find("episode").find("pid")

I get the expected result, a Match whose .text() is wcrhl7thx1w.

However, when I do:

$(document).find("results episode pid")

or

$(document).xpath("//results//episode//pid")

I get back an empty Match object. I've also tried //pid, //results and a variety of other xpath and nothing comes back. The only xpath I can get something back for is //*.

Is there something amiss in 1.2.0 or have I been looking at this too long and missed something?

p.s. Thank you for the library, I love it.

lukaseder commented 10 years ago

Hmm, yes, there's an implementation difference in Impl find, between "simple" selectors (matching [\\w-]+) and non-simple ones:

    @Override
    public final Impl find(final String selector) {

        // The * selector is evaluated using the standard DOM API
        if ("*".equals(selector)) {
            List<NodeList> result = new ArrayList<NodeList>();

            for (Element element : elements) {
                result.add(element.getElementsByTagName(selector));
            }

            return new Impl(document, namespaces, this).addNodeLists(result);
        }

        // Simple selectors are valid XML element names without namespaces. They
        // are fetched using a namespace-stripping filter.

        // [#107] Note, Element.getElementsByTagNameNS() cannot be used, as the
        // underlying document may not be namespace-aware!
        else if (SIMPLE_SELECTOR.matcher(selector).matches()) {
            return find(JOOX.tag(selector, true));
        }

        // CSS selectors are transformed to XPath expressions
        else {
            return new Impl(document, namespaces, this).addElements(xpath(css2xpath(selector, isRoot())).get());
        }
    }

The difference is there for performance reasons, but it seems to produce different results, depending on the namespaces that are in use. I suspect the formally correct usage of jOOX with namespaces:

$(document)
    .namespace("my-prefix", "...") // put your namespace URL here, as in xmlns="..."
    .xpath("//my-prefix:results//my-prefix:episode//my-prefix:pid");

Namespaces currently seem not to be supported when using find() and css selectors. This should be fixed.

ccudennec commented 9 years ago

I just ran into the same issue. What do you think about using the local name instead of the tag name, e.g."//*[local-name() = 'foo']" in CSS2XPath?

Geraldf commented 7 years ago

I have an issue as well. I try to get the links of all href using the following xpath String: "//a[contains(@href, 'wiki/Mathe_f')]/@href/text()" this returns an empty selection, while "//a[contains(@href, 'wiki/Mathe_f')]" returns all relevant "a" elements

lukaseder commented 7 years ago

@Geraldf: jOOX can only "Match" XML elements, not attributes or text nodes, unfortunately. You could write this to get the same result, though:

$(xml).xpath("//a[contains(@href, 'wiki/Mathe_f')]").attr("href")
moaxcp commented 1 year ago

I have run into the same issue with using a default namespace. After checking the code I saw is used to create the document setNamespaceAware(true). Then I got the idea to pass the document to jOOX instead.

var domFactory = DocumentBuilderFactory.newInstance();
var builder = domFactory.newDocumentBuilder();
var document = builder.parse(new ByteArrayInputStream("""
    <VAST version="4.2" xmlns="http://www.iab.com/VAST">

    </VAST>
    """.getBytes()));
var vast42version = $(document).xpath("/VAST").attr("version");

assertThat(vast42version).isEqualTo("4.2");

This worked for me but when performing modifications to the document the default builder is used again to build the code fragments. The modifications end up with an empty namespace attribute.

$(document).append("\n<Pricing model=\"cpm\" currency=\"USD\"><![CDATA[ 25.00 ]]></Pricing>\n");
var transformer = TransformerFactory.newInstance().newTransformer();
StringWriter writer = new StringWriter();
transformer.transform(new DOMSource(document), new StreamResult(writer));
assertThatXml(writer.toString())
    .and("""
        <?xml version="1.0" encoding="UTF-8" standalone="no"?>
        <VAST xmlns="http://www.iab.com/VAST" version="4.2">
            <Pricing currency="USD" model="cpm" xmlns=""><![CDATA[ 25.00 ]]></Pricing>
        </VAST>
        """)
    .ignoreWhitespace()
    .areIdentical();

What I believe may work but I have not tried yet is building a Document and appending the Element instead of a String.

Edit:

Ok even making the document and appending the elements ends up with an empty xmlns attribute. What worked for me was to use the default document made by jOOX. Then instead of modifying the xml with strings pass in the elements to jOOX. I took some code from jOOX and added the namespace to the wrapper document. This is modified from Util.createContent.

        public Element[] modifyContent(String content) {
            String wrapped = "<dummy xmlns=\"http://www.iab.com/VAST\">" + content + "</dummy>";
            Document parsed = null;
            try {
                parsed = JOOX.builder().parse(new InputSource(new StringReader(wrapped)));
            } catch (SAXException | IOException e) {
                return new Element[0];
            }
            DocumentFragment fragment = parsed.createDocumentFragment();
            NodeList children = parsed.getDocumentElement().getChildNodes();

            // appendChild removes children also from NodeList!
            while (children.getLength() > 0) {
                fragment.appendChild(children.item(0));
            }

            fragment = (DocumentFragment) document.importNode(fragment, true);
            return JOOX.list(fragment.getChildNodes()).toArray(new Element[0]);
        }
lukaseder commented 1 year ago

@moaxcp: I'm not sure if your comment is a question, or a bug report, or a feature request? In any case, to properly track things (as this issue has already been closed), can you please create a new issue? It may or may not be related to this one...