Open stevecheckoway opened 6 years ago
Thank you for submitting this! I'm just back from vacation and it will take a few days to catch up on everything. Thanks for your patience.
In retrospect, I'm not sure what this function is supposed to do at all. If I'm reading xml_node.c
correctly (and I'm not at all sure that I am since I'm having a separate issue), all of the namespace reparenting should be happening in relink_namespace
.
Hey, @stevecheckoway! Finally getting around to looking at this.
all of the namespace reparenting should be happening in relink_namespace
Yes, you're right. This method originally came from #870 which was a fix for #869; but the actual root cause was that relink_namespaces
was bailing early -- skipping attributes -- if the element didn't have a namespace itself.
I just wrote #2310 which fixes the underlying implementation of relink_namespaces
, but now we have a different issue, which is: there don't appear to be any/many tests of the HTML5::Node
behavior implemented in 7b3c972. Hang on for one sec on that point ...
Side note: this refactor became much easier after #1712 was fixed by #2246 and #2228 was fixed by #2230, because the reparenting/namespacing behavior was much less buggy.
I'm not sure what this function is supposed to do at all
Again, this method came from #870 which was a fix for #869. Specifically, it's intended to handle cases like this:
Nokogiri::XML::Builder.new do |xml|
xml.root('xmlns:foo' => 'bar') {
xml.obj('foo:attr' => 'baz')
}
end
We're trying to resolve this ambiguity:
foo:attr
attr
and it's in the namespace identified by the prefix foo
?If we were re-designing the Builder API, we could probably allow for explicitly setting the namespace for that attribute. However, there's an analogous use case that doesn't use Builder:
doc1 = Nokogiri::XML('<root/>')
doc2 = Nokogiri::XML('<root xmlns:foo="http://foo.io"/>')
node = doc1.create_element('obj', 'foo:attr' => 'baz')
# at this point, the attribute is named `foo:attr`
# #<Nokogiri::XML::Element:0xba4 name="obj" attributes=[#<Nokogiri::XML::Attr:0xb90 name="foo:attr" value="baz">]>
doc2.root.add_child(node)
# but now the attribute is `attr` with a namespace
# #<Nokogiri::XML::Element:0xba4 name="obj" attributes=[#<Nokogiri::XML::Attr:0xbcc name="attr" namespace=#<Nokogiri::XML::Namespace:0xbb8 prefix="foo" href="http://foo.io"> value="baz">]>
Your suggested fix breaks this "namespacing" process for the attribute in question.
Normally this is not an issue if the implied prefix doesn't correspond to a namespace in the document -- for example, using foo:lang
instead of xml:lang
correctly creates this attribute:
#<Nokogiri::XML::Attr:0x550 name="foo:lang" value="en">
but unfortunately in an XML document the xml
prefix is reserved and will always resolve.
@stevecheckoway WDYT of an alternative approach: what if we skipped relink_namespace
in HTML4+5 documents? I'm going to start a new branch where I try to write a test for 7b3c972
See #2310 for what I think is a good fix for all of this that in fewer lines of code.
What problems are you experiencing?
Nokogiri::XML::Node#add_child_node_and_reparent_attrs
uses an incorrect test to see if a node's namespaces need to be reparented (at least, I think that's the purpose of this code).It's testing
a.name =~ /:/
to decide to reparent. This breaks HTML elements which are allowed to have colons in their attribute names. (Essentially, HTML only allows foreign elements to have explicit namespaces.)It's a little convoluted to demonstrate the problem using the Nokogiri API.
This prints
I note that using
doc.root = html
doesn't do this reparenting.Here's a backwards compatible fix.
This prints
Ideally, an API for manipulating attributes in a given namespace that doesn't do the parsing based on colon (e.g., uses
xmlNewProp
rather thanxmlSetProp
) would be fantastic.What's the output from
nokogiri -v
?Can you provide a self-contained script that reproduces what you're seeing?