mr-niels-christensen / environment-scotland-dot-rural

Apache License 2.0
0 stars 1 forks source link

Avoid failures in copying graphs #60

Closed mr-niels-christensen closed 9 years ago

mr-niels-christensen commented 9 years ago

The cron job that copies "newdata" to "default" failed when it took more than 2 minutes to complete. The problem seems to be that iterators (cursors) over NDB queries time out after 2 minutes (because data may have changed anyway).

The current best solution suggestion is to not copy the data at all. All access (server- and client-side) must use a meta-graph to determine which graph to use. Instead of "default" and "newdata", graphs would be "A" and "B".

Possible variation: Could ndbstore provide enough metadata that the meta-graph would be unnecessary? Possible variation: Would it be better to use disposable graphs, replace "A" and "B" with a datetime or a hash?

Possible alternative solution: Optimize the copy. Do not read and write triples, just copy the N3 blobs in the db.Model objects. [But only experiments will tell how much this scales] Possible alternative solution: Shard the copy operation into smaller pieces. [But this does not scale well]

Not a solution: Use direct NDB->NDB copy. [There isn't a copy operation].

mr-niels-christensen commented 9 years ago

Actually, as long as you stick to a fixed list of graphs (like "A" and "B"), the metadata could live inside the graphs, keeping the overall setup simpler.

mr-niels-christensen commented 9 years ago

Suggested solution:

mr-niels-christensen commented 9 years ago

TODO: Tools for transition/cleaning up graphs