Closed njh closed 11 years ago
It would be great if multiple HTTP clients were supported, for example http://github.com/toland/patron
Readers/format classes should implement simple Regexp test on content to be parsed, if format not detected from extension, mime-type or explicit request. For instance:
input.match(/<html/i) && RDF::RDFa::Format
Related issue #24 (with regards to improving the HTTP client functionality).
More from a recent email response to hellekin at cepheide.org:
RDF::Reader.for needs to be somewhat smarter.
The symbol case is limited to using an element of the classname (e.g. RDF::RDFXML => :rdfxml). It would be nice to specify alternate symbols (e.g., :rdf). Of course, this can be done through for(:extension => "rdf"). RDF::Reader.open, when loading a remote resource, should look at the returned Mime-Type to do a format match, rather than requiring it be provided explicitly. Arto seems to be of the opinion that this is done via LinkedData, but it seems to be a fair thing to do directly in RDF.rb I believe that Format specifications should also provide a RegExp to match against the beginning of the content (I use the first 1000 bytes in RdfContext). This would be used within RDF::Reader.open in case a format couldn't be found through other uses, consider the following:
def detect_format(stream)
if stream.respond_to?(:rewind) stream.rewind string = stream.read(1000) stream.rewind else string = stream.to_s end case string when /<(\w+:)?RDF/ then :rdfxml when /<\w+:)?html/i then :rdfa when /@prefix/i then :n3 else :ntriples end end
This could instead be found by looping through available Format subclasses and looking for a #match method. Within RDFXML::Format, I could perform the following:
class Format < RDF::Format MATCH = %r(<(\w+:)?RDF))
content_type 'text/turtle', :extension => :ttl content_type 'text/n3', :extension => :n3 content_encoding 'utf-8'
reader { RDF::N3::Reader } writer { RDF::N3::Writer }
def match(content) content.to_s.match(MATCH) end end
In RDF::Reader.open, first look for a reader using the options. Then, failing that, open the file and look for a mime-type, failing that, loop through Format instances and see if the Format matches the string content.
In most cases, this will do what the user expects.
RestClient may be of interest: http://github.com/archiloque/rest-client http://rdoc.info/github/archiloque/rest-client
This one looks pretty interesting too: http://github.com/eric1234/open_uri_db_cache
Recently I needed to re-visit this issue in RdfContext to support RDFa 1.1 profiles. Profiles are a mechanism for defining RDF prefixes and terms in a separate document. The spec encourages implementer to cache these vocabularies, for obvious reasons. I implemented this using a ConjunctiveGraph, which is a graph over all quads within a Store (or Repository). When I see a profile, I look for it as a context within the ProfileGraph and download, parse it and add it to the store as necessary.
To do this in RDF.rb is difficult, because RDF::Reader.open inverts finding the reader and opening the resource. Ideally, the resource should be opened first so that, for example, mime-type can be retrieved to perform content-negotation, and the resource can be inspected to see if it is up-to-date. The following is a potential refactor of RDF::Reader.open that extracts the open and provides the same simple Kernel.open implementation. This makes it easier for another module to override this, or perhaps to register an alternative reader to provide better HTTP semantics.
module RDF class Reader def self.open(filename, options = {}, &block) resource = URLResource.new(filename) reader = self.for(options.slice(:format).merge(:content_type => resource.mime_type)) reader ||= self.for(filename) raise FormatError.new("unknown RDF format: #{options[:format] || filename}") unless reader
reader.new(resource.io, options, &block)
end
class URLResource
attr_reader :url, :mime_type, :etag, :format
attr_reader :modified_at, :checked_at,
def initialize(url)
@file = Kernel.open(url, "r")
end
def io; @file; end
end
end
end
There still remains the question of how best to implement this in RDF::RDFa, but that is a different conversation.
Since 0.3.4, RDF.rb can perform format detection in RDF::Reader.for (or RDF::Format.for with :sample option or a block which returns a sample). Of course, content-negotiation is handled using rack-linkeddata or sinatra-linkeddata (and soon rack-sparql or sinatra-sparql).
http://lists.w3.org/Archives/Public/public-rdf-ruby/2010May/0024.html
Implement content negotiation in RDF.rb clients. Ideally with q= values for each of the supported parsers.
I would like to be able to do this: