ruby-rdf / rdf

RDF.rb is a pure-Ruby library for working with Resource Description Framework (RDF) data.
http://rubygems.org/gems/rdf
The Unlicense
383 stars 98 forks source link

Content negotiation in RDF.rb clients #12

Closed njh closed 11 years ago

njh commented 14 years ago

http://lists.w3.org/Archives/Public/public-rdf-ruby/2010May/0024.html

Implement content negotiation in RDF.rb clients. Ideally with q= values for each of the supported parsers.

I would like to be able to do this:

repo = RDF::Repository.new
repo.load('http://www.bbc.co.uk/programmes/b00jnwlc#programme')
repo.each { |s| s.inspect! }
njh commented 14 years ago

It would be great if multiple HTTP clients were supported, for example http://github.com/toland/patron

gkellogg commented 14 years ago

Readers/format classes should implement simple Regexp test on content to be parsed, if format not detected from extension, mime-type or explicit request. For instance:

input.match(/<html/i) && RDF::RDFa::Format
artob commented 14 years ago

Related issue #24 (with regards to improving the HTTP client functionality).

gkellogg commented 14 years ago

More from a recent email response to hellekin at cepheide.org:

RDF::Reader.for needs to be somewhat smarter.

The symbol case is limited to using an element of the classname (e.g. RDF::RDFXML => :rdfxml). It would be nice to specify alternate symbols (e.g., :rdf). Of course, this can be done through for(:extension => "rdf"). RDF::Reader.open, when loading a remote resource, should look at the returned Mime-Type to do a format match, rather than requiring it be provided explicitly. Arto seems to be of the opinion that this is done via LinkedData, but it seems to be a fair thing to do directly in RDF.rb I believe that Format specifications should also provide a RegExp to match against the beginning of the content (I use the first 1000 bytes in RdfContext). This would be used within RDF::Reader.open in case a format couldn't be found through other uses, consider the following:

Heuristically detect the input stream

def detect_format(stream)

Got to look into the file to see

if stream.respond_to?(:rewind) stream.rewind string = stream.read(1000) stream.rewind else string = stream.to_s end case string when /<(\w+:)?RDF/ then :rdfxml when /<\w+:)?html/i then :rdfa when /@prefix/i then :n3 else :ntriples end end

This could instead be found by looping through available Format subclasses and looking for a #match method. Within RDFXML::Format, I could perform the following:

class Format < RDF::Format MATCH = %r(<(\w+:)?RDF))

content_type 'text/turtle', :extension => :ttl content_type 'text/n3', :extension => :n3 content_encoding 'utf-8'

reader { RDF::N3::Reader } writer { RDF::N3::Writer }

def match(content) content.to_s.match(MATCH) end end

In RDF::Reader.open, first look for a reader using the options. Then, failing that, open the file and look for a mime-type, failing that, loop through Format instances and see if the Format matches the string content.

In most cases, this will do what the user expects.

njh commented 13 years ago

RestClient may be of interest: http://github.com/archiloque/rest-client http://rdoc.info/github/archiloque/rest-client

gkellogg commented 13 years ago

This one looks pretty interesting too: http://github.com/eric1234/open_uri_db_cache

gkellogg commented 13 years ago

Recently I needed to re-visit this issue in RdfContext to support RDFa 1.1 profiles. Profiles are a mechanism for defining RDF prefixes and terms in a separate document. The spec encourages implementer to cache these vocabularies, for obvious reasons. I implemented this using a ConjunctiveGraph, which is a graph over all quads within a Store (or Repository). When I see a profile, I look for it as a context within the ProfileGraph and download, parse it and add it to the store as necessary.

To do this in RDF.rb is difficult, because RDF::Reader.open inverts finding the reader and opening the resource. Ideally, the resource should be opened first so that, for example, mime-type can be retrieved to perform content-negotation, and the resource can be inspected to see if it is up-to-date. The following is a potential refactor of RDF::Reader.open that extracts the open and provides the same simple Kernel.open implementation. This makes it easier for another module to override this, or perhaps to register an alternative reader to provide better HTTP semantics.

module RDF class Reader def self.open(filename, options = {}, &block) resource = URLResource.new(filename) reader = self.for(options.slice(:format).merge(:content_type => resource.mime_type)) reader ||= self.for(filename) raise FormatError.new("unknown RDF format: #{options[:format] || filename}") unless reader

    reader.new(resource.io, options, &block)
  end

  class URLResource
    attr_reader :url, :mime_type, :etag, :format
    attr_reader :modified_at, :checked_at,

    def initialize(url)
      @file = Kernel.open(url, "r")
    end

    def io; @file; end
  end
end

end

There still remains the question of how best to implement this in RDF::RDFa, but that is a different conversation.

gkellogg commented 12 years ago

Since 0.3.4, RDF.rb can perform format detection in RDF::Reader.for (or RDF::Format.for with :sample option or a block which returns a sample). Of course, content-negotiation is handled using rack-linkeddata or sinatra-linkeddata (and soon rack-sparql or sinatra-sparql).