rdf-pipeline / framework

Automatically exported from code.google.com/p/rdf-pipeline
Apache License 2.0
4 stars 1 forks source link

Make GraphNode invalidate its caches if the repository is in memory and the server restarts #79

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Since SPARQL does not provide any way to reliably determine whether a graph 
exists, GraphNodes' fExists function always returns true, which means that if 
an in-memory repository is used, and the graph store goes down and comes up 
again, all of the graphs will be gone, but fExists will report that they are 
still there, so the caches will be treated as fresh even though they will all 
be empty.

One way to fix this would be to write a sentinel triple to the SPARQL store 
whenever a GraphNode updater is fired:

  INSERT DATA { GRAPH p:rdf-pipeline-internals {
    <> p:graphsExist true .
  }}

and when fExists is called, return true iff that triple is still found.

Original issue reported on code.google.com by david@dbooth.org on 25 Apr 2014 at 3:23

GoogleCodeExporter commented 9 years ago
A good fix for this -- and other cases -- may be to change fExists to treat an 
empty graph as non-existent.  I think this may be a reasonable approach 
because: (a) it will allow a deleted graph to be treated as deleted; and (b) in 
most cases an empty graph probably did not take a lot of work to generate, so 
erroneously regenerating it is probably okay.  

This whole issue arises because SPARQL cannot distinguish between an empty 
graph and a non-existent graph.

Original comment by david@dbooth.org on 14 May 2014 at 1:39