ruby-rdf / json-ld

Ruby JSON-LD reader/writer for RDF.rb
The Unlicense
232 stars 27 forks source link

cache context please #13

Closed ndushay closed 9 years ago

ndushay commented 10 years ago

It seems that json-ld is not caching context documents, and I strongly feel it should. I was temporarily blocked from w3c due to this lack of caching.

I see in the gemspec that there is a development dependency on open-uri-cached; however, i believe the actual code in question is doing a Net::HTTP.get request https://github.com/ruby-rdf/json-ld/blob/develop/lib/json/ld/api.rb#L463 called from https://github.com/ruby-rdf/json-ld/blob/master/lib/json/ld/context.rb#L292

so the open-uri-cached gem is perhaps not in play. In any case, I think this should be a production default.

Thanks!

azaroth42 commented 10 years ago

+1 from me (hi @gkellogg!)

gkellogg commented 10 years ago

Open-uri-cached is used for testing only, as it doesn't check to ensure cached resource is current, or otherwise obey cache-control headers.

This has been on my list for a while, so thanks for the poke. Baring major improvements in that gem (or an alternative), I think the easiest way to handle it is to use a weak-reference cache in the runtime, which would cause it to be fetched only when the class is initialized within a runtime; this wouldn't help scripted use, but would aide in a web application (such as the linter or distiller). As threads get garbaged collected when instances are idle, this should help insure reasonable currency.

Alternatively, property HTTP cache control could be implemented within the Gem, but this is obviously a much bigger effort.

Thoughts?

ndushay commented 10 years ago

Actually, I was blocked not due to running my rails app, but due to running my specs. I think it is a reasonable use case to be parsing json-ld in specs, and I think the json-ld class would be initialized in each individual test, right? So you would still be fetching the context for each spec ... just as it does now. Thus, the weak-reference cache approach you mention might not be sufficient, for a broader context than just my work.

I'm not convinced open-uri-cached is actually put into play for testing -- I tried the same approach with the open-uri-cached gem on my own code, and nothing was ever written to the caching directory. Is it really working when you run json-ld tests? We surmised that the open-uri-cached gem affects the open_uri method only, but the json-ld code uses net::http.get which doesn't seem to exercise the open_uri method.

The elegant solution might be a sort of open-uri-cached approach for net-http, perhaps. Maybe with a param to indicate how often to flush the cache (e.g. 1 hour, 5 hours, 1 day, 30 days...). open-uri-cached looks pretty simple -- could that be easily adapted to net::http get? Or something akin to http_cache gem: https://github.com/umut/http-cache/blob/master/lib/hoydaa/net_http_cache.rb. Or maybe just using an approach that utilizes HTTP_IF_MODIFIED_SINCE when it loads the context document, along with a simple local cache? (See http://ruby-doc.org/stdlib-2.1.2/libdoc/net/http/rdoc/Net/HTTP.html#class-Net::HTTP-label-Setting+Headers)

wmene commented 10 years ago

I suggest using APICache gem for the contexts https://github.com/mloughran/api_cache

You can configure it to use a File-backed cache (say using /tmp/json-ld as a default), given the need to save requests for specs. I'm going to try it out in a branch

gkellogg commented 10 years ago

Looks promising as a store, but the real issue is ensuring HTTP cache-control correctness. I have a feature branch in the RDF.rb gem which is playing with this.

Note, for JSON-LD, you could always use your own documentLoader which can be passed as an option into the API and do whatever caching you like there. This is what's done in the JSON-LD specs, for example.

gkellogg commented 9 years ago

This now takes advantage of changes to RDF::Util::File.open_file. If you require restclient/components you can get RestClient to use Rack::Cache, which gives a broad way to deal with client-side caching. See the doc for RDF::Util::File.open_file for more details.

Gargron commented 7 years ago

@gkellogg Could you provide an example of how to hook up the documentLoader with Rails.cache? I tried but got weird recursion errors, so I am not sure how that option/method is supposed to look.

gkellogg commented 7 years ago

Sorry, I'm out for a bit with surgery, so can't respond well right now. There may be some in the respec directory, I used rack-client-components.

gkellogg commented 7 years ago

I haven't used this in a while, due to instability in rest-client and rest-client-components, however the RDF.rb gem has a RestClientAdaptor, based on HttpAdapter. If rest-client-components is loaded, then this should enable the rest-client cache to be used for requests.

From 85eb3d7502ef2a834d9d734b11eb6dab785e6beb, set up RestClient as follows (or however you normally stet up Rack caching:

require 'restclient/components'
require 'rack/cache'

# Create and maintain a cache of downloaded URIs
URI_CACHE = File.expand_path(File.join(File.dirname(__FILE__), "uri-cache"))
Dir.mkdir(URI_CACHE) unless File.directory?(URI_CACHE)
# Cache client requests
RestClient.enable Rack::Cache,
  verbose:      true, 
  metastore:   "file:" + ::File.expand_path("../uri-cache/meta", __FILE__),
  entitystore: "file:" + ::File.expand_path("../uri-cache/body", __FILE__)

In RDF::Util::File, if the :use_net_http is not set, and RestClient is defined, it will use the RestClientAdapter, or you can set RDF::Util::File.http_adapter explicitly in a global. See http://www.rubydoc.info/gems/rdf/RDF/Util/File.

You can also use your own documentLoader, as an option to any JSON-LD API call, and documentLoader is called with other API options. The built-in documentLoader invokes RDF::Util::File.open_file with passed options.

If things have settled back out, I would accept a PR to re-enable client caching when for tests. Otherwise, it's intended to be set up in a persistent service.