ruby-rdf / rdf-rdfa

Ruby RDFa reader/writer for RDF.rb.
http://ruby-rdf.github.com/rdf-rdfa
The Unlicense
35 stars 11 forks source link

RDF::RDFa reader/writer

RDFa parser for RDF.rb.

Gem Version Build Status Coverage Status Gitter chat

DESCRIPTION

RDF::RDFa is an RDFa reader and writer for Ruby using the RDF.rb library suite.

FEATURES

RDF::RDFa parses RDFa into statements or triples.

Install with gem install rdf-rdfa

Pure Ruby

In order to run as pure ruby (not requiring any C modules), this gem does not directly depend on Nokogiri and falls back to using REXML. As REXML is not really an HTML parsing library, the results will only be useful if the HTML is well-formed. For best performance, install the Nokogiri gem as well.

Important changes from previous versions

RDFa is an evolving standard, undergoing some substantial recent changes partly due to perceived competition with Microdata. As a result, the RDF Webapps working group is currently looking at changes in the processing model for RDFa. These changes are now being tracked in {RDF::RDFa::Reader}:

RDFa 1.1 Lite

This version fully supports the limited syntax of RDFa Lite 1.1. This includes the ability to use @property exclusively.

Vocabulary Expansion

One of the issues with vocabularies was that they discourage re-use of existing vocabularies when terms from several vocabularies are used at the same time. As it is common (encouraged) for RDF vocabularies to form sub-class and/or sub-property relationships with well defined vocabularies, the RDFa vocabulary expansion mechanism takes advantage of this.

As an optional part of RDFa processing, an RDFa processor will perform limited OWL 2 RL Profile entailment, specifically rules prp-eqp1, prp-eqp2, cax-sco, cax-eqc1, and cax-eqc2. This causes sub-classes and sub-properties of type and property IRIs to be added to the output graph.

{RDF::RDFa::Reader} implements this using the #expand method, which looks for rdfa:usesVocabulary properties within the output graph and performs such expansion. See an example in the usage section.

Experimental support for rdfa:copy template expansion

RDFa 1.1 is just about an exact super-set of microdata, except for microdata's @itemref feature. Experimental support is added for rdfa:copy and rdfa:Pattern to get a similar effect using expansion. To use this, reference another resource using rdfa:copy. If that resource has the type rdfa:Pattern, the properties defined there will be added to the resource containing the rdfa:copy, and the pattern and rdfa:copy will be removed from the output.

For example, consider the following:

<div>
  <div typeof="schema:Person">
    <link property="rdfa:copy" resource="_:a"/>
  </div>
  <p resource="_:a" typeof="rdfa:Pattern">Name: <span property="schema:name">Amanda</span></p>
</div>

if run with vocabulary expansion, this will result in the following Turtle:

@prefix schema: <http://schema.org/> .
[a schema:Person; schema:name "Amanda"] .

RDF Collections (lists)

One significant RDF feature missing from RDFa was support for ordered collections, or lists. RDF supports this with special properties rdf:first, rdf:rest, and rdf:nil, but other RDF languages have first-class support for this concept. For example, in Turtle, a list can be defined as follows:

[ a schema:MusicPlayList;
  schema:name "Classic Rock Playlist";
  schema:numTracks 5;
  schema:tracks (
    [ a schema:MusicRecording; schema:name "Sweet Home Alabama";       schema:byArtist "Lynard Skynard"]
    [ a schema:MusicRecording; schema:name "Shook you all Night Long"; schema:byArtist "AC/DC"]
    [ a schema:MusicRecording; schema:name "Sharp Dressed Man";        schema:byArtist "ZZ Top"]
    [ a schema:MusicRecording; schema:name "Old Time Rock and Roll";   schema:byArtist "Bob Seger"]
    [ a schema:MusicRecording; schema:name "Hurt So Good";             schema:byArtist "John Cougar"]
  )
]

defines a playlist with an ordered set of tracks. RDFa adds the @inlist attribute, which is used to identify values (object or literal) that are to be placed in a list. The same playlist might be defined in RDFa as follows:

<div vocab="http://schema.org/" typeof="MusicPlaylist">
  <span property="name">Classic Rock Playlist</span>
  <meta property="numTracks" content="5"/>

  <div rel="tracks" inlist="">
    <div typeof="MusicRecording">
      1.<span property="name">Sweet Home Alabama</span> -
      <span property="byArtist">Lynard Skynard</span>
     </div>

    <div typeof="MusicRecording">
      2.<span property="name">Shook you all Night Long</span> -
      <span property="byArtist">AC/DC</span>
    </div>

    <div typeof="MusicRecording">
      3.<span property="name">Sharp Dressed Man</span> -
      <span property="byArtist">ZZ Top</span>
    </div>

    <div typeof="MusicRecording">
      4.<span property="name">Old Time Rock and Roll</span>
      <span property="byArtist">Bob Seger</span>
    </div>

    <div typeof="MusicRecording">
      5.<span property="name">Hurt So Good</span>
      <span property="byArtist">John Cougar</span>
    </div>
  </div>
</div>

This basically does the same thing, but places each track in an rdf:List in the defined order.

Magnetic @about/@typeof

The @typeof attribute has changed; previously, it always created a new subject, either using a resource from @about, @resource and so forth. This has long been a source of errors for people using RDFa. The new rules cause @typeof to bind to a subject if used with @about, otherwise, to an object, if either used alone, or in combination with some other resource attribute (such as @href, @src or @resource).

For example:

<div typeof="foaf:Person" about="https://greggkellogg.net/foaf#me">
  <p property="name">Gregg Kellogg</span>
  <a rel="knows" typeof="foaf:Person" href="https://manu.sporny.org/#this">
    <span property="name">Manu Sporny</span>
  </a>
</div>

results in

<https://greggkellogg.net/foaf#me> a foaf:Person;
  foaf:name "Gregg Kellogg";
  foaf:knows <https://manu.sporny.org/#this> .
<https://manu.sporny.org/#this> a foaf:Person;
  foaf:name "Manu Sporny" .

Note that if the explicit @href is not present, i.e.,

<div typeof="foaf:Person" about="https://greggkellogg.net/foaf#me">
  <p property="name">Gregg Kellogg</span>
  <a href="https://github.com/ruby-rdf/rdf-rdfa/blob/develop/knows" typeof="foaf:Person">
    <span property="name">Manu Sporny</span>
  </a>
</div>

this results in

<https://greggkellogg.net/foaf#me> a foaf:Person;
  foaf:name "Gregg Kellogg";
  foaf:knows [ 
        a foaf:Person;
        foaf:name "Manu Sporny" 
  ].

Support for embedded RDF/XML

If the document includes embedded RDF/XML, as is the case with many SVG documents, and the RDF::RDFXML gem is installed, the reader will add extracted triples to the default graph.

For example:

<?xml version="1.0" encoding="UTF-8"?>
<svg width="12cm" height="4cm" viewBox="0 0 1200 400"
    xmlns:dc="http://purl.org/dc/terms/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xml:base="http://example.net/"
    xmlns="http://www.w3.org/2000/svg" version="1.2" baseProfile="tiny">
  <desc property="dc:description">A yellow rectangle with sharp corners.</desc>
  <metadata>
    <rdf:RDF>
      <rdf:Description rdf:about="">
        <dc:title>Test 0304</dc:title>
      </rdf:Description>
    </rdf:RDF>
  </metadata>
  <!-- Show outline of canvas using 'rect' element -->
  <rect x="1" y="1" width="1198" height="398"
        fill="none" stroke="blue" stroke-width="2"/>
  <rect x="400" y="100" width="400" height="200"
        fill="yellow" stroke="navy" stroke-width="10"  />
</svg>

generates the following turtle:

@prefix dc: <http://purl.org/dc/terms/> .

<http://example.net/> dc:title "Test 0304" ;
  dc:description "A yellow rectangle with sharp corners." .

Support for embedded N-Triples or Turtle

If the document includes a &lt;script&gt; element having an @type attribute whose value matches that of a loaded RDF reader (text/ntriples and text/turtle are loaded if they are available), the data will be extracted and added to the default graph. For example:

<html>
  <body>
    <script type="text/turtle"><![CDATA[
       @prefix foo:  <http://www.example.com/xyz#> .
       @prefix gr:   <http://purl.org/goodrelations/v1#> .
       @prefix xsd:  <http://www.w3.org/2001/XMLSchema#> .
       @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

       foo:myCompany
         a gr:BusinessEntity ;
         rdfs:seeAlso <http://www.example.com/xyz> ;
         gr:hasLegalName "Hepp Industries Ltd."^^xsd:string .
    ]]></script>
  </body>
</html>

generates the following Turtle:

   @prefix foo:  <http://www.example.com/xyz#> .
   @prefix gr:   <http://purl.org/goodrelations/v1#> .
   @prefix xsd:  <http://www.w3.org/2001/XMLSchema#> .
   @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

   foo:myCompany
     a gr:BusinessEntity ;
     rdfs:seeAlso <http://www.example.com/xyz> ;
     gr:hasLegalName "Hepp Industries Ltd."^^xsd:string .

Support for Role Attribute

The processor will generate RDF triples consistent with the Role Attr specification.

<div id="heading1" role="heading">
  <p>Some contents that are a header</p>
</div>

generates the following Turtle:

@prefix xhv: <http://www.w3.org/1999/xhtml/vocab#> .
<#heading1> xhv:role xhv:heading.

Support for microdata

The RDFa reader will call out to RDF::Microdata::Reader, if an @itemscope attribute is detected, and the microdata reader is loaded. This avoids a common problem when pages contain both microdata and RDFa, and only one processor is run.

Support for value property

In an RDFA+HTML Errata, it was suggested that the @value attribute could be parsed to obtain a numeric literal; this is consistent with how it's treated in microdata+rdfa. This processor now parses the value of an @value property to determine if it is an xsd:integer, xsd:float, or xsd:double, and uses a plain literal otherwise. The datatype can be overriden using the @datatype attribute.

Usage

Reading RDF data in the RDFa format

graph = RDF::Graph.load("etc/doap.html", format: :rdfa)

Reading RDF data with vocabulary expansion

graph = RDF::Graph.load("etc/doap.html", format: :rdfa, vocab_expansion: true)

or

graph = RDF::RDFa::Reader.open("etc/doap.html").expand

Reading Processor Graph

graph = RDF::Graph.load("etc/doap.html", format: :rdfa, rdfagraph: :processor)

Reading Both Processor and Output Graphs

graph = RDF::Graph.load("etc/doap.html", format: :rdfa, rdfagraph: [:output, :processor])

Writing RDF data using the XHTML+RDFa format

require 'rdf/rdfa'

RDF::RDFa::Writer.open("etc/doap.html") do |writer|
  writer << graph
end

Note that prefixes may be chained between Reader and Writer, so that the Writer will use the same prefix definitions found during parsing:

prefixes = {}
graph = RDF::Graph.load("etc/doap.html", prefixes: prefixes)
puts graph.dump(:rdfa, prefixes: prefixes)

Template-based Writer

The RDFa writer uses Haml templates for code generation. This allows fully customizable RDFa output in a variety of host languages. The default template generates human readable HTML5 output. A minimal template generates HTML, which is not intended for human consumption.

To specify an alternative Haml template, consider the following:

require 'rdf/rdfa'

RDF::RDFa::Writer.buffer(haml: RDF::RDFa::Writer::MIN_HAML) << graph

The template hash defines four Haml templates:

Dependencies

Documentation

Full documentation available on Rubydoc.info

Principle Classes

TODO

Resources

Change Log

See Release Notes on GitHub

Author

Contributors

Contributing

This repository uses Git Flow to mange development and release activity. All submissions must be on a feature branch based on the develop branch to ease staging and integration.

License

This is free and unencumbered public domain software. For more information, see https://unlicense.org/ or the accompanying UNLICENSE file.

FEEDBACK