pchampin / sophia_rs

Sophia: a Rust toolkit for RDF and Linked Data
Other
217 stars 23 forks source link

How to use the Graph.iter_* interface? #5

Closed MattesWhite closed 5 years ago

MattesWhite commented 5 years ago

Hi, I'm using this crate a lot for an IoT application and realy like it. However, I run into problems when iterating a Graph.

The core idea I want to accomplish is to build structs from the contents of a Graph. Therefore I defined following trait:

pub trait FromGraph<'a, T, G>: Sized
where
    T: Borrow<str>,
    G: Graph<'a>,
    MyError: From<<G as Graph<'a>>::Error>,
{
    fn from_graph(s: &'a Term<T>, graph: &'a G) -> MyResult<Self>;
}

This works fine until I want to parse nested Structs, e.g. I have the following graph:

@prefix rdf: <http://www.w3.org/...#> .
@base <http://example.org/> .

EntityA a A ;
  rdf:value 42 .
EntityB a B ;
  hasA EntityA ;
  rdf:value 24 .

To use those in my code I've written the following in-Rust-representation:

lazy_static! {
    static ref HAS_A: Term<&'static str> = unsafe{ Term::new_iri_unchecked("http://example.org/hasA", Some(true)) };
}

#[derive(Debug, Clone, Copy)]
struct A {
    value: i32,
}

impl<'a, T, G> FromGraph<'a, T, G> for A
where
    T: Borrow<str>,
    G: Graph<'a>,
    MyError: From<<G as Graph<'a>>::Error>,
{
    fn from_graph(s: &'a Term<T>, graph: &'a G) -> MyResult<Self> {
        let t_value = graph.iter_for_sp(s, &rdf::value).last().ok_or(MyError)??;
        let t_value = t_value.o();
        let value = t_value.value().parse::<i32>()?;
        Ok(A { value })
    }
}

#[derive(Debug, Clone, Copy)]
struct B {
    a: A,
    value: i32,
}

impl<'a, T, G> FromGraph<'a, T, G> for B
where
    T: Borrow<str>,
    G: Graph<'a>,
    MyError: From<<G as Graph<'a>>::Error>,
{
    fn from_graph(s: &'a Term<T>, graph: &'a G) -> MyResult<Self> {
        let t_a = graph.iter_for_sp(s, &HAS_A).last().ok_or(MyError)??;
        let t_a = t_a.o(); // here is where the error occures
        let a = A::from_graph(t_a, graph)?;

        let t_value = graph.iter_for_sp(s, &rdf::value).last().ok_or(MyError)??;
        let t_value = t_value.o();
        let value = t_value.value().parse::<i32>()?;
        Ok(B { a, value })
    }
}

The error I get is t_a does not live long enough

   |
64 | impl<'a, T, G> FromGraph<'a, T, G> for B
   |      -- lifetime `'a` defined here
...
72 |         let t_a = t_a.o(); // here is where the error occures
   |                   ^^^----
   |                   |
   |                   borrowed value does not live long enough
   |                   argument requires that `t_a` is borrowed for `'a`
...
79 |     }
   |     - `t_a` dropped here while still borrowed

Do you have any idea how I could get this working? Maybe an API change could solve this?

In the end I want to accomplish something serde-flavored like:

#[derive(FromGraph)]
struct A {
  #[predicate(rdf:value)]
  value: i32,
}

#[derive(FromGraph)]
struct B {
  #[predicate(HAS_A)]
  a: A,
  #[predicate(rdf:value)]
  value: i32,
}
MattesWhite commented 5 years ago

After a day of examining the probelm, I conclude that the problems originates from Triple's lifetime. A solution would be to separate the trait Triple<'a> into the term-owning Triple<T: TermData> and the term-borrowing TripleRef<'a, T: TermData>. As a result the lifetime of Graph<'a> could be removed, so triples(&'a self) -> GTripleSource<'a> could be rewritten to (pseudocode) triples<'a>(&'a self) -> Iterator<Item = TripleRef<'a, T>>. I'll try this solution myself and write a PR if I'm successful.

BTW: I would introduce the super-trait TermData for convienence:

/// Supertrait for all properties data of a `Term<T>` must provide.
pub trait TermData: AsRef<str> + Clone + Eq + Hash {}
impl<T> TermData for T where T: AsRef<str> + Clone + Eq + Hash {}
pchampin commented 5 years ago

Thanks a lot for your interest in sophia. I'm glad to hear about someone else using it. Please be aware, though, that the API is still very unstable at the moment (but I guess you figured that, given the early stage of the project).

The lifetimes with Term, Triple and Graph are a mess. The ideal solution would be that graph use a Generic Associated Type instead of having the lifetime "fixed" in its own type parameters, but GTAs are not available in Rust yet -- although I heard that they might land during 2019...

In the meantime, I came up with this design, but it has some drawbacks, as you painfully discovered. I'm very interested in your solution. I'll have a look at it myself, but any PR would be more than welcome.

pchampin commented 5 years ago

Ok, I have a working version of your example, in the dedicated branch I just pushed.

The solution relies on Higher-Rank Trait Bounds, which I already had to use in some parts of the crate. This is not ideal, because it is a relatively advanced and complex feature of Rust (it is only documented in the Nomicon, AFAIK), and at the time, I hoped that users of the crate would not need to be exposed to them, but your example proves me wrong on that part.

So this way of using Graph in trait bounds should be thoroughly documented, and possibly eased by creating trait aliases (such as the one you suggest for TermData). Although not ideal, it is the only way I found to provide the level of genericity that I want to achieve. Your proposal is interesting, but I don't want to force Graph implementations to yield TripleRefs, some implementations may create triples on demand and "give them away".

As I mentionned earlier, the ideal solution would be to have Generic Associated Types, so the API may get simpler when (if?) they get included in Rust.

MattesWhite commented 5 years ago

Thank you for your efforts and indeed that solves my concrete problem.

However, I still think that the trait interface of Graph<'a> including a lilfetime implies wrong semantics. Because the lifetime is associated with triples returned from the Graph not to Terms that are actually stored in the Graph itself. I finally found a solution that is, in my opinion, clearer about lifetimes. The most important changes are:

pub trait Graph {
    type TermData: TermData;
    type Error: CoercibleWith<Error> + CoercibleWith<Never>;

    fn triples<'a>(&'a self) -> GTripleSource<'a, Self>;
    // ...
}

pub type GResult<G, T> = std::result::Result<T, <G as Graph>::Error>;
pub type GTripleSource<'a, G> = Box<dyn Iterator<Item = GResult<G, RefTriple<'a, <G as Graph>::TermData>>>>;
pub type RefTriple<'a, T> = [&'a Term<T>; 3];

This removes the lifetimes of Graph and Triple while the iterators hold a correct lifetime which is directly associated to the call to triples(). As mentioned before I'm working on refactoring the whole library but as this is a major change it takes some time. I know this approach is contrary to your approach having the Graph as generic as possible because the returned triples are now fixed to RefTriple<'a, T>. As long as there are no GATs I think this is the best solution. Investigating the Graph's implementations I only found triples of RefTriple<'a, T> build and returned in the triple-iterators. I think that is fine as there are only the costs for contructing the Triples that way.

As I understand a destinction between a TripleSource of Triples that own their Terms for consumption and a TripleSource of Triples that only borrow their Terms for inspecting a Graph's contents could be benefitial. For example something like:

trait TripleSource {
  type Triple: Triple + 'static;
  type Error: ...;
  type Iter: Iterator<Item = Result<Self::Triple, Self::Error>>;
  // ...
}

// pseudo code
trait GTriples<'a, G> {
  type Iter: Iterator<Item = GResult<G, RefTriple<'a, <G as Graph>::TermData>>>;

  fn into_triple_source(self) -> impl TripleSource {
    self.map(|[s, p, o]| { [s.clone(), p.clone(), o.clone()] })
  } 
}
pchampin commented 5 years ago

Regarding graphs returning RefTriples: this will be a problem as soon as we add file-backed or database-backed implementations of Graph, where the triples will not be permanently stored in memory.

The lifetime parameter of Graph it is the lifetime for which the graph can be borrowed for iterating over triples -- and indirectly, it also constraints the lifetime of the terms of the graph. I agree that it is confusing, but not necessarily "wrong", as long as we make things really clearer. I pushed to the branche issue5 a proposal for documenting this, as well as trait aliases to make it easier to use. In practice, the trait OwnedGraph would be used instead of the trait Graph in type bounds, and would provide roughly the same ergonomics as your proposal, without sacrificing genericity.

MattesWhite commented 5 years ago

Okay, I have to admit that your API design is much better reasoned than I thought on the first look. Especially, I've completly forgotten about the non-inmem Graphs. I'll adopt your proposal and continue my work on the "from-graph" crate (name is WIP).

Thank you very much for your time and efforts helping me. I'm looking forward to see the future of this crate!

pchampin commented 5 years ago

For the record, I added the documentation (in commit 692c6e8) but I did not add trait aliases.

For Graph::Error, I thought that CoercibleError<Never>+CoercibleError<Error> is more explicit that a more compact trait alias would be.

For the Higher-Rank Trait Bound itself, I realized that it may have several variants, when the developer wants to constrain Graph::Triple and/or Graph::Error. So here again, I think that explicit is better than implicit.