Literal as type - Githubissues

MattesWhite commented 4 years ago

This PR contributes to #34 .

With this PR the contents of Term::Literal are extracted into a new type LiteralData. This should allow third-party authors to easier add specific behavior for RDF-literals, e.g. serialization.

Furthermore, the calls to n3() has been reduced in the modified code (see #34).

In addition, the function Term::new_literal() is added. A shortcut for creating simple literals, i.e. literals with datatype xsd:string.

pchampin commented 4 years ago

I'm not sure exactly what problem you are trying to solve with the LiteralData type.

All I can see is additional verbosity due to the intermediate type.

I can see a few interesting ideas:

naming subfields of literals (text/value & kind),
having a simpler constructor for xsd:strings (although I tended to favor uniformity by having all (safe) term constructors return a result, even when they could not fail...),

but all of them could be implemented without the additional LiteralData type.

What am I missing?

MattesWhite commented 4 years ago

My intention is to seperate handling of the single RDF-terms (IRI, blank node, literal, variable) from an RDF-term that can be anything. In my opinion this makes it easier to reason about how a certain type of term is handled.

Let me explain it for serialization. Assume a trait for serializing something conformant to NTriples:

trait NtSerializable {
    fn seriallize<W: io::Write>(&self, w: &mut W) -> io::Result<()>;
}

This can be implemented for basic types like IriData, BNodeId, ... then serializing a Term would just be (simplyfied):

impl NtSerializable for Term {
    fn serialize<W: io::Write>(&self, w: &mut W) -> io::Result<()> {
        match self {
            Iri(iri) => iri.serialize(w),
            Literal(literal) => literal.serialize(w),
            ...
        }
    }
}

Continuing this up to Graph or Database (maybe with specialized traits for strict and so on). As a user it enables me to precisely find the code how different elements of RDF are serialized.

Another point I want to make is uniformity. As the same is already done for IRIs and blank nodes I thought it would just be consequent to do the same for literals (and maybe variables). Especially, as blank nodes are less complex as literals but already have an own type.

In a broader scale the seperation of Literals also supports a higher idea of mine. Actually, I wanted to come up with this later. The idea is to make Term a trait. Yes, this probably adds a lot more of generics, where-clauses and verbosity. But I think this only appies to the library side.

Applications will pick probably only a bunch of concrete crates from the sophia ecosystem. Therefore, they will not need to handle complex where-clauses.

On the other hand this would allow us to support external term implementations with existing sophia code, For example, use rio's terms directly in sophia::HashGraph without the need for conversion. Another example is to handle strict and generalized RDF by different implementations of Term reusing the base types for IRIs, blank nodes and literals.

Furthermore, extensions of the RDF-standard could easily interoperate with the sophia ecosystem. In fact, I'm thinking of Notation 3 here. In order to support quoted graphs in triples it is essential to support another kind of Term.

However, this is not the topic of this PR and should prabably be moved into an own issue.

pchampin commented 4 years ago

Ok, that is clearer, thanks.

In the current design, I thought of the Termenum as a sort of "class hierarchy", where the enum itself is the superclass, and the variants (Iri, Literal, etc...) are the subclasses. The "inner types" (such as IriData and BNodeId) were mere artifacts, introduced to overcome the limitations on enum variants (they can not have private attributes or methods of their own).

As I understand, you propose an approach where each subclass is defined as a separate type, and variants of the superclass enum are mere wrappers. I agree this is more regular, probably easier to understand, and possibly more idiomatic (see for example https://docs.rs/json/0.12.1/json/enum.JsonValue.html).

If we go that way, though, I would prefer:

to name the subclass types as closely as possible to their variant (so renaming IriData to Iri, and so on), and have them defined in respective submodules (term::iri, term::litral, etc.);
to make Literal itself an enum, with variants Typed{value: T, datatype: T}, Lang{value: T, tag: T} and get rid of LiteralKind, because I find that the currently proposed design makes pattern matching on literals way too verbose...

As for turning Term into a trait, this is indeed a totally different topic. Feel free to open an issue about it...

MattesWhite commented 4 years ago

I'm happy to see that we came to an agreement, my intentions are exactly as you understand.

Your preferred points are reasonable and I agree on them, too.

As this PR starts to heavily conflict with #37 and the changes required differ more and more from my initial proposal I close this PR and will commit a new one when the serializer PR is merged.

pchampin / sophia_rs

Literal as type #35