mhausenblas / schema-org-rdf

Schema.org in RDF
http://schema.rdfs.org
186 stars 57 forks source link

Parsing all.ttl from http://schema.rdfs.org/ with rdf (ruby) generates a few errors #57

Open petervandenabeele opened 10 years ago

petervandenabeele commented 10 years ago

I downloaded all.ttl from http://schema.rdfs.org/ today on 26 May 2014.

I tried parsing it with a recent version of rdf (see versions below).

I got a few errors.

I can look deeper into it, if relevant.

The first error e.g. seems caused by an unexpected newline in the middle of the string; the source ttl around line 4029 looks like:

...
schema:accessibilityAPI a rdf:Property;
    rdfs:label "Accessibility API"@en;
    rdfs:comment "Indicates that the resource is compatible with the referenced accessibility API (WebSchemas wiki lists possible values).
     "@en;
    rdfs:domain schema:CreativeWork;
...

The full error log is:

[16] pry(main)> uri = RDF::URI.new("schema_org.ttl")
=> #<RDF::URI:0x3fc8a19b1140 URI:schema_org.ttl>
[17] pry(main)> schema_graph = RDF::Graph.load(uri)
ERROR [line: 4029] With input '"Indicates that the resource is compatible with the referenced accessibility API (WebSchemas wiki lis': Invalid token "\"Indicates" (found "\"Indicates"), production = :_predicateObjectList_5
ERROR [line: 4029] With input 'WebSchemas wiki lists possible values).
     "@en;
    rdfs:domain schema:CreativeWork;
    rdfs:rang': Invalid token "WebSchemas" (found "WebSchemas"), production = :collection
ERROR [line: 4462] With input '"The target group associated with a given audience (e.g. veterans, car owners, musicians, etc.)
     ': Invalid token "\"The" (found "\"The"), production = :_predicateObjectList_5
ERROR [line: 4462] With input 'e.g. veterans, car owners, musicians, etc.)
      domain: Audience
      Range: Text
    "@en;
    rd': Invalid token "e.g." (found "e.g."), production = :collection
ERROR [line: 4462] undefined prefix "domain"
ERROR [line: 4462] Expected one of [:IRIREF, :BLANK_NODE_LABEL, :ANON, "(", "[", :PNAME_LN, :PNAME_NS, :INTEGER, :DECIMAL, :DOUBLE, "true", "false", :STRING_LITERAL_QUOTE, :STRING_LITERAL_SINGLE_QUOTE, :STRING_LITERAL_LONG_SINGLE_QUOTE, :STRING_LITERAL_LONG_QUOTE] (found "A"), production = :objectList
ERROR [line: 4462] undefined prefix "Range"
ERROR [line: 4462] With input 'Text
    "@en;
    rdfs:domain schema:Audience;
    rdfs:range xsd:string;
    rdfs:isDefinedBy <http': Invalid token "Text" (found "Text"), production = :_triples_1
ERROR [line: 4462] Expected one of [:IRIREF, :BLANK_NODE_LABEL, :ANON, "(", "[", :PNAME_LN, :PNAME_NS, :INTEGER, :DECIMAL, :DOUBLE, "true", "false", :STRING_LITERAL_QUOTE, :STRING_LITERAL_SINGLE_QUOTE, :STRING_LITERAL_LONG_SINGLE_QUOTE, :STRING_LITERAL_LONG_QUOTE] (found ";"), production = :objectList
ERROR [line: 4463] Expected one of [:IRIREF, :BLANK_NODE_LABEL, :ANON, "(", "[", :PNAME_LN, :PNAME_NS, :INTEGER, :DECIMAL, :DOUBLE, "true", "false", :STRING_LITERAL_QUOTE, :STRING_LITERAL_SINGLE_QUOTE, :STRING_LITERAL_LONG_SINGLE_QUOTE, :STRING_LITERAL_LONG_QUOTE] (found ";"), production = :objectList
ERROR [line: 4464] Expected one of [:IRIREF, :BLANK_NODE_LABEL, :ANON, "(", "[", :PNAME_LN, :PNAME_NS, :INTEGER, :DECIMAL, :DOUBLE, "true", "false", :STRING_LITERAL_QUOTE, :STRING_LITERAL_SINGLE_QUOTE, :STRING_LITERAL_LONG_SINGLE_QUOTE, :STRING_LITERAL_LONG_QUOTE] (found ";"), production = :objectList
=> #<RDF::Graph:0x3fc8a19a0ffc(default)>
[18] pry(main)> schema_graph.count
=> 8717

➜  schema.org  gem list rdf

*** LOCAL GEMS ***

rdf (1.1.3)
rdf-aggregate-repo (1.1.0)
rdf-isomorphic (1.1.0)
rdf-json (1.1.0)
rdf-microdata (1.1.1.1)
rdf-n3 (1.1.0.1)
rdf-rdfa (1.1.3.1)
rdf-rdfxml (1.1.0.1)
rdf-trig (1.1.3.1)
rdf-trix (1.1.0)
rdf-turtle (1.1.3.1)
rdf-xsd (1.1.0)
petervandenabeele commented 10 years ago

Applying the diff below on the all.ttl file, allowed a successful parsing by rdf (ruby).

This may hint that newlines are incorrectly processed, as hinted in issue #43 Maybe that ticket needs to be bumped.

➜  schema.org  diff schema_org.ttl schema_org_01.ttl 
4029,4030c4029
<     rdfs:comment "Indicates that the resource is compatible with the referenced accessibility API (WebSchemas wiki lists possible values).
<      "@en;
---
>     rdfs:comment "Indicates that the resource is compatible with the referenced accessibility API (WebSchemas wiki lists possible values)."@en;
4464,4467c4463
<     rdfs:comment "The target group associated with a given audience (e.g. veterans, car owners, musicians, etc.)
<       domain: Audience
<       Range: Text
<     "@en;
---
>     rdfs:comment "The target group associated with a given audience (e.g. veterans, car owners, musicians, etc.) - domain: Audience - Range: Text."@en;
indeyets commented 10 years ago

the patch is already pushed, but code-update never happened on the web-site, as far as I understand.

I asked @mhausenblas to update it, but as far as I understand he doesn't have access now.