liaojinxing / firepath

Automatically exported from code.google.com/p/firepath
GNU General Public License v3.0
0 stars 0 forks source link

xpath doesn't work (suspect namespace issue) #21

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Open document with <html xmlns='http://some-namespace'>...</html>
2. open firepath
3. enter //div

What is the expected output? What do you see instead?
I expect many divs to be displayed

What version of the product are you using? On what operating system?
latest, windows xp, firefox 3.6.15

Please provide any additional information below.
Even if I do "copy XPath" from the document, still no nodes are found for that 
expression.
//* works as expected
//*[name()='div'] also works as expected
it seems using xmlns is the culprit, there needs to be a way to register a 
default namespace, or use have a hard coded prefix for default namespace.

Original issue reported on code.google.com by roman.ga...@gmail.com on 14 Mar 2011 at 4:53

GoogleCodeExporter commented 9 years ago
Thanks for reporting this issue.

The way you are describing the problem suggest that it is coming from the fact 
that you are adding a default namespace (xmlns='http://some-namespace') to the 
html document. However the problem is a more complicated than that.

For example if you have the following file, named "test.html":

<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
    </head>
    <body>
        <div>this is a div</div>
    </body>
</html>

and if you execute the XPath: "//div" it will actually match the div in the 
document.

Now if you take the same file but change its extension from "html" to "xhtml" 
or "xml". Or, if the document comes from a web server if you set its content 
type to: "text/xml", "application/xml" or "application/xhtml+xml". 
Then when you try to run the same XPath, no nodes will be selected.

First things to understand here, is that because the file extension or the 
content type, Firefox will interpret the document's code as XML, not HTML. This 
is explained in more details here: 
https://developer.mozilla.org/en/XML_in_Mozilla#XHTML.

Now, why FirePath is having trouble with xml document? 
The answer can be found here: 
https://developer.mozilla.org/en/Introduction_to_using_XPath_in_JavaScript#Imple
menting_a_default_namespace_for_XML_documents
Basically, Firefox XPath engine does not handle correctly default namespace 
inside xml documents. One possible workaround is to create a custom namespace 
resolver and this is what FirePath does. However even with a custom namespace 
resolver you still need to use some namespace prefix inside you XPath 
expression.

In order to get the name of the prefix corresponding to the default namespace, 
FirePath runs the following regular expression on the namespace name: 
/.*[^\w](\w+)[^\w]*$/ and uses the first group as the result.

Here are examples of the resulting prefix for some commonly used namespaces:
ns: http://www.w3.org/1999/xhtml  prefix: xhtml
ns: http://www.w3.org/2000/svg  prefix: svg

Hence for the second file if you run the xpath expression "//xhtml:div" this 
will actually match the div.

Now the problem is that there is no way to guess this. This is actually a bug 
which only appears with the "http://www.w3.org/1999/xhtml" namespace and which 
will be fixed in the next version of FirePath.

If you use another default namespace such as: "http://www.w3.org/2000/svg" then 
the node inside FirePath panel will be displayed with the svg prefix. This way 
the user can guess that he needs to use the prefix in order to match the nodes 
he wants.

For example the file:
<svg xmlns="http://www.w3.org/2000/svg" width="100%" height="100%" 
version="1.1">
   <rect width="300" height="100" style="fill:rgb(0,0,255);stroke-width:1; stroke:rgb(0,0,0)"/>
</svg>

will be displayed like this in FirePath panel:
<document>
   <svg:svg xmlns="http://www.w3.org/2000/svg" width="100%" height="100%" version="1.1">
      <svg:rect width="300" height="100" style="fill: rgb(0, 0, 255);"/>
   </svg:svg>
</document>

As you can see FirePath automatically add the "svg" prefix in front of each 
nodes and it should be the same for nodes with the xhtml namespace (but 
currently it is not like this because of a bug).

I would appreciate any feedback about the current behavior and if you think it 
could be improved.

Concerning a workaround to the current problem you are having simply use 
"//xhtml:div" instead of "//div".

Original comment by pierre.t...@gmail.com on 14 Mar 2011 at 8:37

GoogleCodeExporter commented 9 years ago

Original comment by pierre.t...@gmail.com on 14 Mar 2011 at 8:38

GoogleCodeExporter commented 9 years ago
That's fine, but for me it's still not working.

I can actually right-click any node in the tree and select "set as xpath" and 
have it show up in the text field, but it's not actually selecting that node 
when i click on "eval".

The XML does have it's own namespace, and it even worked yesterday. Difference 
being, that for some reason FirePath replaced my "gx" namespace prefix with 
it's own "schema" prefix. But even if I use this instead, it's not working.

Original comment by m...@gaxweb.com on 14 Mar 2012 at 10:20