pchampin / sophia_rs

Sophia: a Rust toolkit for RDF and Linked Data
Other
210 stars 23 forks source link

How to recognize the Any matcher? #150

Closed KonradHoeffner closed 6 months ago

KonradHoeffner commented 7 months ago

The HdtGraph implementation of the Sophia Graph trait only supports constant and any matchers because unpacking and testing each entry would be too slow. However the Graph trait requires the following:

    fn triples_matching<'s, S, P, O>(&'s self, sm: S, pm: P, om: O) -> GTripleSource<'s, Self>
    where
        S: TermMatcher + 's,
        P: TermMatcher + 's,
        O: TermMatcher + 's,

What I've done until now is just treat every matcher where tm.constant() returns None as Any, however that is a bad design that hides errors for users, so I want to rather return an error if users specify unsupported matchers.

So I tried this:

if TypeId::of::<S>() == TypeId::of::<sophia::api::term::matcher::Any>() {...}
...

However the std::any::TypeId::of method only works with a static lifetime like S: TermMatcher + 'static, which I cannot do because then I get the compile error that the impl is stricter than the trait, so I have the following questions:

damooo commented 7 months ago

Most peobably, then HdtGraph should be considered as not implementing Graph trait. It can have it's custom methods to query with only Any/ constant matchers.?

KonradHoeffner commented 7 months ago

That is a good point. Originally I wanted to use the Sophia Graph trait in my application as a trait object, so that the code path is the same whether a user loads an RDF Turtle or an HDT file. However in practice that is not possible anyways because the Sophia Graph trait cannot be used as a trait object, so there are different code paths anyways, so I will consider not implementing the Graph trait.

However there are still two problems:

  1. It would make sense to implement non-Any matches later, and still use optimized processing for the Any matcher.
  2. According to https://docs.rs/sophia/0.8.0-alpha.3/sophia/, in the latest version of Sophia the enum https://docs.rs/sophia/latest/sophia/term/matcher/enum.AnyOrExactly.html does not seem to exist anymore (I also checked out the repo and did not find it with ack "AnyOrExactly",I can build my own but reusing existing functionality would be more elegant).
pchampin commented 6 months ago

What I've done until now is just treat every matcher where tm.constant() returns None as Any

This is indeed a breach of the contract of the Graph trait. Your argument for this is:

testing each entry would be too slow

Yes and no. Calling triples_matching with, says, a closure as a matcher, would indeed be much slower than calling it with a constant matcher or Any instead. But any call to triples_matching with only constant matchers and Anys would not be penalized, thanks to monomorphization (Any::matches always return true, so it will be optimized away).

Granted, it would mean that calling triples_matching would sometimes be very fast, and sometimes very slow, depending on the kind of matcher you pass. That might be considered a problem, but I would argue that it's the price to pay with any generic trait. You can still (and probably should) document, in your implementation, in which situations triples_matching will perform well, and in which it won't.

You can even imagine providing an extra method, specific to your implementation, that you could call triples_matching_fast for example, which would only accept the kind of parameters that make the call fast. It could for example accept Options of terms, where Some(t) would behave as a constant matcher, and None would behave as Any. All you would need to do would be to convert these parameters to the corresponding matcher, and call triples_matching, which would be design be monomorphized to an efficient version.

pchampin commented 6 months ago

To summarize: you don't need to recognize the Any matcher, because the compiler does that already during monomorphization, and does the "right thing" (i.e. optimizes away the call match).

KonradHoeffner commented 6 months ago

@pchampin Wow, this works extremely well, thanks for the great advice! None of the benchmarks have dropped at all and I support non-constant matchers now for all triple patterns. I have never used/seen monomorphization and didn't expect it to work when the respective functions are called inside other functions like filter.

pchampin commented 6 months ago

@KonradHoeffner now you are starting to get a feel for what "zero cost abstraction" really means :-)