shepmaster / sxd-xpath

An XPath library in Rust
Apache License 2.0
121 stars 34 forks source link

Functionality to specify a default namespace. #110

Closed leoschwarz closed 6 years ago

leoschwarz commented 7 years ago

Hi. Maybe the functionality is already there and I'm missing something, but essentially I was parsing this XML:

<?xml version="1.0" encoding="UTF-8"?>
<metadata xmlns="http://musicbrainz.org/ns/mmd-2.0#">
<area id="a1411661-be21-4290-8dc1-50f3d8e3ea67" type-id="6fd8f29a-3d0a-32fc-980d-ea697b69da78" type="City">
    <name>Honolulu</name>
    <sort-name>Honolulu</sort-name>
</area>
</metadata>

and I had some issues selecting elements due to the namespace. After I saw the test and #108 I've figured out I can use a Context with

context.set_namespace("mb", "http://musicbrainz.org/ns/mmd-2.0#");

and then use mb: for all elements in my XPath queries. e.g. //mb:area/mb:name/text(). However this is a bit cumbersome, so I wanted to ask if functionality to specify a default namespace which would be used for unqualified selectors in XPath queries is desirable? Or would this break the XPath spec somehow?

(I might implement this functionality, I just wanted to ask in advance for some thoughts so I don't end up coding for nothing. ^^)

shepmaster commented 7 years ago

Or would this break the XPath spec somehow?

Maybe.... how do you read section 2.3, emphasis mine:

A QName in the node test is expanded into an expanded-name using the namespace declarations from the expression context. This is the same way expansion is done for element type names in start and end-tags except that the default namespace declared with xmlns is not used: if the QName does not have a prefix, then the namespace URI is null (this is the same way attribute names are expanded). It is an error if the QName has a prefix for which there is no namespace declaration in the expression context.

That being said, the feature seems like a reasonable case. I think that an implementation can go beyond the spec so long as the doing so is opt-in and the behavior of the spec can still be followed.

That being said, this post seems to indicate that the maintainer of libxml2 thought doing so would be a bad idea. Hard to tell at this point the full background on that.

One avenue of research would be to look at some other XPath libraries (Java has a few, IIRC) and see if any of those offer an equivalent feature.

leoschwarz commented 7 years ago

Ok I see, thanks for your reply. As far as I understand the main issue would be that according to the XPath spec element names without a prefix are by default interpreted as not being in any namespace at all, i.e. the null namespace. Functionality to specify a default namespace would have to override that behaviour which might make things potentially confusing. (Because if I would want to select an element from the null namespace after declaring a default name space it would not be possible anymore.)

I tried looking around for how other libraries do it and I didn't really find any example where it was possible but I didn't spend a huge amount of time looking for one either. I guess it might be better to stick to the standard in this case.

I've actually seen that in XPath 2.0 there is the possibility to use a wildcard prefix *:elname but this has its own downsides (i.e. it's less explicit and it might be better to assert the right schema is actually there) and I guess I will just stick with declaring a namespace like it's supposed to be done.

shepmaster commented 7 years ago

if I would want to select an element from the null namespace after declaring a default name space it would not be possible anymore

I believe that to be the case, yes.

stick with declaring a namespace like it's supposed to be done

That would be my personal opinion, yes 😄 That being said, there might be nicer paths to defining the namespace. I'm always interested in hearing feedback about the API of the library works for people.

leoschwarz commented 7 years ago

I guess I might have asked for such functionality because it took me a bit to figure out how to set up my parser. It might be a (potentially bad) thing learned from OOP languages but I would have preferred if there was a struct which would contain everything (sxd_xpath::Context, sxd_xpath::Factory, sxd_document::Package and sxd_document::dom::Document), that could be easily constructed and then queried using XPath.

Something like:

XpathReader<'d> {
    fn new(xml: &str) -> Result<Self, _>;
    fn evaluate(&self, node: N, xpath_query: &str) -> Result<sxd_xpath::Value<'d>, _>
        where N: Into<sxd_xpath::nodeset::Node<'d>>;
}

But I'm not sure if that would work like that with the node argument if the struct owns the Document. (An alternative would be to just query against the root element, even if that is a bit limited.) And I'm not sure where I would put the Context, maybe like:

XpathReader<'d> {
    fn new_with_context(xml: &str, context: &'d Context) -> Result<Self, _>;
    // or
    fn set_context(&mut self, context: &'d Context);
}

What I have done so far is this, which is inspired by struct Setup in sxd-xpath/tests/integration.rs. I haven't been able to move the Document and Package into the struct because of lifetime issues. Is this even possible? It feels slightly repetitive to have the same couple lines of code in every XML parsing function of mine.

Maybe I'm also just having some Rust beginner issues. :')

leoschwarz commented 7 years ago

Ok I figured out where the problem was (I had to add the lifetime to &'d self – I wasn't even aware this was possible 😅), so here is an implementation of the reader struct as I proposed it before. Just a proof of concept, error handling and docs are ugly right now. I'm not sure if something like that belongs into the library or an example or if it's actually obvious how to do it for most users?

leoschwarz commented 7 years ago

Actually I've implemented some of the ideas I mentioned in this issue in my crate xpath_reader. #110 and #111 are no longer of concern to me personally and you can close them if you want. (Or keep them if you think the concern might persist for other people.)

shepmaster commented 7 years ago

@leoschwarz IIRC, this issue morphed from default namespaces (which we decided was a bad idea) to something about nicer ways to construct the objects. Which aspect(s) of that did you implement?

leoschwarz commented 7 years ago

The later one, I basically ended up taking out the code I've written for musicbrainz_rust and cleaned it up a bit for the crate. It's a bit more than I initially proposed here, but also more useful.

The main interface is the trait XpathReader which provides read, read_option and read_vec of items which implement FromXml or OptionFromXml. Also I added some functionality to emit relative readers (that's why XpathReader is a trait).

Namespace configuration still takes place using the Context object from sxd-xpath. If you are interested (I don't want to bother you XD) I can link some example code later when I refactor musicbrainz_rust to the new API I'm using in the xpath_reader crate.

leoschwarz commented 6 years ago

@shepmaster I'm closing this issue since I am not really in support of it anymore, but if you would like to keep it open please feel free to reopen it. As discussed it would probably be counter-intuitive to diverge from the XPath spec and might cause other issues along the way. This Stack Overflow answer explains why this design decision was deliberately made in the first place.

If anything this is a documentation issue and should be covered in a prominent place.

Also the other thing I ended up discussing here and sort of shifting the issue is providing a serde-inspired reader API, which I'm still working on and which I plan to get in a nice shape in the next few days. (I'm thinking about an actual serde deserializer in the future but I'm not yet sure how to go about that.) If at some point you think this would be nice functionality to also have in this crate I'd be happy to help upstreaming it or some parts thereof. I made a lot of refactorings (and breaking changes) for the upcoming 0.5 release, so the current docs at docs.rs are currently very outdated.