Virtuoso - named graphs used to store intermediate results - there is just one!!

mff-uk / odcs

ODCleanStore

1 stars 11 forks source link

Virtuoso - named graphs used to store intermediate results - there is just one!! #97

Closed tomas-knap closed 11 years ago

tomas-knap commented 11 years ago

When virtuoso is selected as intermediate storage of intermediate results, named graphs has to be properly used. Engine should ensure that.

The named graphs should be used as described in https://grips.semantic-web.at/display/LOD2/Pipeline, running on top of Virtuoso.

Currently, only one named graph is used for all executions, which is not acceptable and does not make any sense!! Therefore the sceenario for running pipeline with Virtuoso cannot be achieved.

Jirko, if this is not task for you, delegate that to Petr/Honza?

tomesj commented 11 years ago

The default graph is set by creating Virtuoso in constructor. All operation (count of triples/adding/deleting, etc) are automatically doing over actuall set default graph. If you can use other graph for each type of DPU instance (extractor/transformer/loader) you just call method setDefaultGraph(String defaultGraph) on instance of Virtuoso repository before working concrete type of DPU - each DPU can then use his graph. Using of method is describe in comment in VirtuosoRDFRepo class.

I delegate this for Petr - he knows where is create new instance in code and concrete place, where is this method "setDefaultGraph" needed to call to satisfy the expected result.

skodapetr commented 11 years ago

setDefaultGraph is called in DataUnitFactory now with I believe unique graph name. Jirka please check the functionality with Virtuoso server.

tomas-knap commented 11 years ago

So if I understand it correctly, current implementation of named graphs usage in the engine for normal run/debug is as describe in https://grips.semantic-web.at/display/LOD2/Pipeline for defensive debug run? So every output and input is store to a separate named graph and engine ensures copying of data from output graph of DPU X (e.g. extractor from file) to input graph of DPU Y (e.g. loader to file?). And such approach is used for both run and also debug?

tomesj commented 11 years ago

I think, that Peter set unique name for pipeline, but not for every type of DPU. I try Virtuoso for running DBPedia, but intermediate results for RDF SPARQL extractor and RDF File loader were not same. I try to find the place with problem and repair it.

tomas-knap commented 11 years ago

Jirka, I am creating for that bug #111

tomas-knap commented 11 years ago

May be also associated with bug #110

tomas-knap commented 11 years ago

I would suggest to start with suboptimal solution - to use different named graph for each input/output data unit. Engine has to ensure copy of output named graph to the corresponding input graph of the next DPU.

Thank we can optimize later for normal run as sketched in https://grips.semantic-web.at/display/LOD2/Pipeline We should come up with the proof that our optimization strategy works, so that is the reason why we should start with suboptimal but easy one.

What do you think Jirka,Petr?

tomesj commented 11 years ago

Suboptiomal solution completed - each DPU has his own graph for storage intermediate results by using Virtuoso.

Solved by commit 3ea7c92479ff09ff5ed79a2363aa45aad8de36e7

tomesj commented 11 years ago

Used created graphs in my Virtuoso conductor - all works perfectly :-)