pchampin / sophia_rs

Sophia: a Rust toolkit for RDF and Linked Data
Other
214 stars 23 forks source link

IRI as own public type #49

Closed MattesWhite closed 4 years ago

MattesWhite commented 4 years ago

This PR refactors IriData into the first-class kind of RDF term Iri.

As this affects many areas of sophia this PR is marked as work-in-progress and is not ready to be merged, yet.

The changes will not only include 1-to-1 translations of existing code but also re-organization of modules, adding and removing of features and alignment with the previous work of refactoring blank nodes (#48) and variables (#47). As I'm not the owner of this crate and due to the reach of this PR, I highly appreciate reviews and discussions.

ToDo list

This list is of course open to discussion. I'll try to work through the list from top to bottom.

pchampin commented 4 years ago

Integrate the join mechanism more deeply into Iri, as far as possible.

+1

write an adapter for iterators, e.g. graph.triples().resolve_with(&term)....

I can see your use case here, but I think I would rather go for something like

    graph.triples().map_triples(|t| base.resolve_triple(t))

(see #50 for my thought on the issue of resolving relative IRIs).

Integrate new Iri into Term, including convenient functions.

+1

Replace usage of Term where actually only an Iri is valid.

What do you have in mind. Recall that Sophia aims to support a generalized model. In particular, any term is allowed in any position of Triples/Quads.

Replace Option<&Term> for graph names with [a dedicated enum]

Your proposed enum is incorrect w.r.t. the RDF Abstract Model, which allows IRIs or blank nodes as graph names. Furthermore, as mentioned above, I'd like to keep the Sophia generalized model as uniform as possible. Literal as graph names are very debatable, but variables as graph names will be useful when we want to represent queries as generalized datasets.

An orthogonal concern is: should we have a dedicated type for graph names, instead of simply using Option<Term>? I previously used a dedicated type, and I found it caused much complexity without having real added value. Furthermore, I think the semantics of the graph name in a quad is accurately conveyed by Option<Term>: triples from the default graph have no associated graph name, while triples from named graphs have some associated graph name, which is a term.

So I'm not really favorable to that last item. But I'm still open for discussion...

MattesWhite commented 4 years ago

Regarding the replacement of Term where only Iri is valid.

Here I had something in mind like Namespace or the data type of literals as they really only work with IRIs. At the moment there is error handling or a panic in those cases. I thought it would be nice to let the type system take care of such situations. Of course, such replacements must be evaluated with user-friendliness in mind.

MattesWhite commented 4 years ago

In the latest commit I removed unsafe from Iri's *_unchecked() methods due to the question I asked at URLO.

MattesWhite commented 4 years ago

I noticed there are a bunch of requested changes to the documentation. Don't think I won't address them. I just wait for a final commit to this PR when it's done.

MattesWhite commented 4 years ago
  • [ ] write an adapter for iterators, e.g. graph.triples().resolve_with(&term)....

As you opened a dedicated issue (#50) should a solution be included here or would you rather like a separate PR?

pchampin commented 4 years ago

Re #50 this PR is fine: your Resolve trait is exactly what I had in mind.

MattesWhite commented 4 years ago

In the last commit I added a resolve_triples() method to TripleSource. If you think the design is okay I would add implementasion for QuadSource and an extension trait for Iterators (Item = Term, Triple, Quad, Result<Term>, ...).

pchampin commented 4 years ago

In the last commit I added a resolve_triples() method to TripleSource

As previously stated, I have mixed feelings about this.

On the one hand, I can see that this is a rather general-purpose method on triple/quad stream, that may deserves a dedicated method.

But on the other hand, if we go down that path, there are many other general-purposes functions that I can think of that would equally deserve their own specific method:

We would end up with a rather cluttered API, while all of this can be achieved rather simply with map_triples. In the case of resolve_triples, this would be

   source.map_triples(|t| base.resolve(t))
MattesWhite commented 4 years ago

You're right resolve_triples() is more or less a convenient method for map_triples(). And especially as it is a convenience I'd like keeping it. Besides the constraints of Iterators they have a pretty rich API. And you get all of it by simply implementing next(). Of course most Iterator methods can be implemented by map() and fold() as well. Triple-/QuadSource can be the same where you get a rich API of convenience methods for free.

Another point is discoverability. With many convenient methods someone can directly see the features sophia provides. Whereas, you have to figure out yourself how the parts fit together when there are only a few methods.

MattesWhite commented 4 years ago

Further digging into *Source adapters I found that there are many duplicate items. I don't know if this is intentional or not. For the later case a comitted a version implementing Resolver for a QuadSource. This way we could remove all the duplicated adapters from quad::stream and even the StreamingModes should be removable.

pchampin commented 4 years ago

re. including resolve_triples in TripleSource: ok, you convinced me :-)

re. duplicate items in triple::stream and quad::stream: first I to reuse the types from triple::stream in quad::stream (only adding additional traits to them), but this required a bunch of pub(crate) qualifiers, which I thought were not very elegant...

MattesWhite commented 4 years ago

It seems to me that the issue of adapters for Triple- and QuadSources goes of topic of this PR.

I conclude this PR with an IntoIterator implementation for Resolver.

There is no coding left I want to do regarding the PR's topic. Therefore, I removed the [WIP] tag and suggest this PR for merging.

pchampin commented 4 years ago

I'm happy with this PR, except for the IntoIter impl for Resolver (see my comment on the code). Actually, I think that the resolve_triples/Resolver thing be treated separately. In particular, I came to the conclusion that TripleSource::resolve_triples should be implemented as follow:

    fn resolve_triples<'a>(self, base: IriParsed<'a>) -> MapSource<Self, XXX>
    where /* ... */
    {
        self.map_triples(move |t| base.resolve(t))
    }

This way, we would inherit all the impls attached to MapSource, including IntoIter.

The only problem is the XXX in the return type of the method; we need to put there the exact type of the closure, which is not possible :-/ But I can see a solution to this: make MapSource more generic: instead of expecting a FnMut(StreamedTriple<S::Triple>) -> T, it would expect a trait SourceMapper<S>, which would be implemented by FnMut(StreamedTriple<S::Triple>) -> T, but also other types...

I suggest we get rid of all the parts of this PR that relate to resolve_triple and the Resolver type. This will be added in the future PR addressing #52. Ideally, this would be done by cherry-picking the relevant commits and submitting a clean branch. But if that's too much hassle, and you prefer to just remove all this with an additional commit, I'll understand.

MattesWhite commented 4 years ago

I think its ready to be merged.