pchampin / sophia_rs

Sophia: a Rust toolkit for RDF and Linked Data
Other
210 stars 23 forks source link

[Use case] deserializing RDF/XML using sophia #117

Closed filippodebortoli closed 7 months ago

filippodebortoli commented 1 year ago

I would like to use sophia to deserialize RDF/XML files looking like the following example:

<rdf:RDF xml:base="http://example.com/example"
        xmlns="http://example.com/example"
        xmlns:myns="https://example.com/my-namespace#"
        xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
        xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
        xmlns:xsd="http://www.w3.org/2001/XMLSchema#">

        <myns:Object sign="myns:le">
            <myns:withChild1 rdf:parseType="Collection">
                <myns:Term>
                    <card:onClass rdf:resource="#A" />
                    <card:withValue rdf:datatype="xsd:nonNegativeInteger">5</card:withValue>
                </myns:Term>
                <myns:Term>
                  <myns:onClass rdf:resource="#C" />
                  <myns:withValue rdf:datatype="xsd:nonNegativeInteger">18</card:withValue>
                </myns:Term>
                <myns:Term>
                <myns:onClass rdf:resource="#D" />
                <myns:withValue rdf:datatype="xsd:nonNegativeInteger">21</card:withValue>
              </myns:Term>
            </myns:withChild1>
            <myns:withChild2 rdf:parseType="Collection">
            <myns:Term>
                <myns:onClass rdf:resource="#B" />
                <myns:withValue rdf:datatype="xsd:nonNegativeInteger">8</card:withValue>
            </myns:Term>
        </myns:withChild2>
        </myns:Object>
        <myns:Object sign="myns:le">
            <myns:withChild2 rdf:parseType="Collection">
            <myns:Term>
                <myns:onClass rdf:resource="#E" />
                <myns:withValue rdf:datatype="xsd:nonNegativeInteger">1</card:withValue>
            </myns:Term>
        </myns:withChild2>
        </myns:Object>

into a Vec of Objects according to the following declaration:

struct Object {
    child_1: Vec<Term>,
    child_2: Vec<Term>,
    sign: String
}

struct Term {
    class: String,
    value: usize
}

Is sophia the right choice for this task? If so, what is the recommended way to go about the process? If not, would you have any recommendation as to where to look for a solution? The crate quick_xml seems to be a possibility, I am not well informed about the capabilities of rio.

By the way, thank you for the great work!

pchampin commented 1 year ago

Thanks for your interest in Sophia :)

Yes, Sophia could be used for that. More specifically, it can parse this RDF/XML file into a Graph. This can be done by adapting the first part of this example, using the RDF/XML parser instead of the Turtle one. Once you have your Graph, you can use its method (e.g. triples_matching) to browse its triples in order to reconstruct your Objects and your Terms.

Sophia does not provide, for the moment, any mechanism to automatically "map" your Graph to custom datatypes (alla Serde). That would be an interesting addition, but it is far from trivial...

I don't think that Rio would be more convenient for this, because like Sophia, it does not provide a better abstraction than triples. And contrarily to Sophia, Rio does not provide a Graph type that you can conveniently query.

quick_xml could be an option if your RDF/XML file have a very rigid structure (i.e. Objects are always at the top level, Terms are always embedded in their object...), in which case you could use it to write a dedicated parser, that would immediately build your datatypes, without the intermediate notion of graph or triple.

Tpt commented 1 year ago

quick_xml could be an option if your RDF/XML file have a very rigid structure (i.e. Objects are always at the top level, Terms are always embedded in their object...), in which case you could use it to write a dedicated parser, that would immediately build your datatypes, without the intermediate notion of graph or triple.

An even simpler method: Use quick-xml Serde support that allows to easily deserialize XML to Rust structs.

pchampin commented 7 months ago

@filippodebortoli I believe this issue can be closed. Do you concur?

You might also be interested in this project: https://github.com/spruceid/linked-data-rs/ . Note that I would like, at some point, to integrate it with Sophia.

filippodebortoli commented 7 months ago

@filippodebortoli I believe this issue can be closed. Do you concur?

You might also be interested in this project: https://github.com/spruceid/linked-data-rs/ . Note that I would like, at some point, to integrate it with Sophia.

Yes, my questions have been answered. Thank you very much for the detailed explanation! :-)