sparklemotion / nokogiri.org

Documentation site for Nokogiri (a ruby library)
https://nokogiri.org/
MIT License
46 stars 24 forks source link

Invalid Expression with using `id()` #53

Closed jcavalieri closed 2 years ago

jcavalieri commented 2 years ago

Hi Nokogiri friends,

With nokogiri-1.13.8-arm64-darwin, I'm trying to use the XPath id() function, and I am getting errors:

dom = Nokogiri::XML::Document.parse('<a><b/></a>')
dom / '//id(*)'
dom / 'id(//*)'
dom / '//*/id()'

The above ruby code gives these errors:

Nokogiri::XML::XPath::SyntaxError: ERROR: Invalid expression: //id(*)
Nokogiri::CSS::SyntaxError: unexpected '//' after 'id('
Nokogiri::XML::XPath::SyntaxError: ERROR: Invalid expression: //*/id()

If I understand Nokogiri correctly, it is using libxml2 under the hood. And the id function looks like it should be implemented: https://github.com/GNOME/libxml2/blob/fb08d9fe837ab64934e6ddc66d442e599c805ca4/include/libxml/xpathInternals.h#L598

Am I missing something?

Thanks, John

flavorjones commented 2 years ago

Hi! Thanks for asking this question. For future consideration, questions are usually asked in the project repo https://github.com/sparklemotion/nokogiri and not here (which is the documentation repo).

Let's start with an example of how you can use the id() function, and then I'll explain what's different between this example and what you're doing:

#! /usr/bin/env ruby

require "nokogiri"

xml = <<~XML
  <root>
    <child id="a" />
    <child id="b" />
    <child id="c" />
  </root>
XML

doc = Nokogiri::HTML::Document.parse(xml)

results = doc.xpath('id("a")')

pp results

outputs:

[#<Nokogiri::XML::Element:0x50 name="child" attributes=[#<Nokogiri::XML::Attr:0x3c name="id" value="a">]>]

One difference here is that I'm parsing an HTML document, which has a DTD that describes a "unique id" as documented in this section of the XPath spec: https://www.w3.org/TR/1999/REC-xpath-19991116/#unique-id

Your XML doc will need to have an appropriate DTD, I think, for this to work as expected.

I'll also note that using Node#/ is imprecise as it guesses (at times incorrectly?) whether you're searching with a CSS query or an XPath query. I recommend you be explicit whenever possible -- in the above example, I use Node#xpath.

Is this helpful?

jcavalieri commented 2 years ago

Thank you, @flavorjones, for the very helpful response.

flavorjones commented 2 years ago

FWIW I find it's generally easier to use XPath's attribute matchers, like this:

#! /usr/bin/env ruby

require "nokogiri"

xml = <<~XML
  <root>
    <child id="a" />
    <child id="b" />
    <child id="c" />
  </root>
XML

doc = Nokogiri::HTML::Document.parse(xml)

results = doc.xpath("//*[@id='a']")

pp results

which outputs the same as above but is a bit more explicit and doesn't carry the DTD requirement.