nvkp / turtle

Golang package for parsing and serializing the Turtle (.ttl) format used for representing RDF data
MIT License
3 stars 0 forks source link

Incorrect output for brick schema ontology #13

Closed jonnyschaefer closed 3 months ago

jonnyschaefer commented 3 months ago

Hallo again,

If I try to parse the brick schema (open-source ontology for building assets, subsystems and data) turtle file https://brickschema.org/schema/1.3.0/Brick.ttl many broken triples are returned, e.g.:

Subject:   https://brickschema.org/schema/Brick#/Ablutions_Room
Predicate: http://www.w3.org/ns/shacl#/rule
Object:    [

...

Subject:   http://www.w3.org/ns/shacl#/or
Predicate: (
Object:    [
package main

import (
    "fmt"
    "github.com/nvkp/turtle"
    "log"
    "os"
)

type Triple struct {
    Subject   string `turtle:"subject"`
    Predicate string `turtle:"predicate"`
    Object    string `turtle:"object"`
}

func (t Triple) contains(str ...string) bool {
    for _, s := range str {
        if s == t.Subject || s == t.Predicate || s == t.Object {
            return true
        }
    }
    return false
}

func main() {
    // https://brickschema.org/schema/1.3.0/Brick.ttl
    file, err := os.ReadFile("Brick.ttl")
    if err != nil {
        log.Fatal(err)
    }

    var triples []Triple
    if err := turtle.Unmarshal(file, &triples); err != nil {
        log.Fatal(err)
    }

    for _, t := range triples {
        if t.contains("[", "]", "(", ")") {
            fmt.Printf("Subject:   %s\nPredicate: %s\nObject:    %s\n\n", t.Subject, t.Predicate, t.Object)
        }
    }
}

I think it has to do with two things:

I don't know if those are meant to be supported by this package.

Going with the example from the Turtle Wikipedia page having anonymous blank nodes https://en.wikipedia.org/wiki/Turtle_(syntax)#Example

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix ex: <http://example.org/stuff/1.0/> .

<http://www.w3.org/TR/rdf-syntax-grammar>
  dc:title "RDF/XML Syntax Specification (Revised)" ;
  ex:editor [
    ex:fullname "Dave Beckett";
    ex:homePage <http://purl.org/net/dajobe/>
  ] .

The parsed triples are:

<http://www.w3.org/TR/rdf-syntax-grammar> <http://purl.org/dc/elements/1.1/title> <RDF/XML Syntax Specification (Revised)>
<http://www.w3.org/TR/rdf-syntax-grammar> <http://example.org/stuff/1.0/editor> <[>
<http://www.w3.org/TR/rdf-syntax-grammar> <http://example.org/stuff/1.0/homePage> <http://purl.org/net/dajobe/>

but should e. g. be (https://www.easyrdf.org/converter):

<http://www.w3.org/TR/rdf-syntax-grammar> <http://purl.org/dc/elements/1.1/title> "RDF/XML Syntax Specification (Revised)" .
<http://www.w3.org/TR/rdf-syntax-grammar> <http://example.org/stuff/1.0/editor> _:genid1 .
_:genid1 <http://example.org/stuff/1.0/fullname> "Dave Beckett" .
_:genid1 <http://example.org/stuff/1.0/homePage> <http://purl.org/net/dajobe/> 

Note that

<http://www.w3.org/TR/rdf-syntax-grammar> <http://example.org/stuff/1.0/homePage> <http://purl.org/net/dajobe/>

is not immediately recognizable as incorrect triple

What are your thoughts on this?

nvkp commented 3 months ago

This is going to be a though one 🤔 let me think about this. Even when I do not find out how to parse the blank nodes and collections, I should at least check for their presence in the input and return an error so that you can be aware that the output is invalid.

nvkp commented 3 months ago

In v1.1.0 I have just introduced a partial support for blank nodes and collection. This still does not make the Bricks.ttl file parse correctly, though.

nvkp commented 3 months ago

@jonnyschaefer in the last version v1.1.1 all should be fine and the file https://brickschema.org/schema/1.3.0/Brick.ttl parses correctly.

Screenshot from 2024-06-13 19-11-27