mff-uk / odcs

ODCleanStore
1 stars 11 forks source link

RDF Extractor - poor method #356

Closed tomas-knap closed 10 years ago

tomas-knap commented 11 years ago

rdfDataUnit.extractFromSPARQLEndpoint(endpointURL, defaultGraphsUri, query, hostName, password, RDFFormat.N3, useStatisticHandler,extractFail);

Definitely must be refactored. You can call more than one method instead of that but please do not use this one. How is RDFFormat.N3 related to the process of extraction from SPARQL? I do not know. Also the flag extractFail is very strange, there is a mechanism of exceptions, right? Please discuss here first

tomesj commented 11 years ago

Reasons of using:

1) boolean value extractFail = if we extract from SPARQL endpoint and result has no triples, we can continue or throw exception and stop. This functionality is asociated with checkbox "Extraction fails if there is no triple extracted" DETAILS in RDF_EXTRACTOR.

2) Data from SPARQL enpoint are extract to repository. Repository we can understand as a container for RDF data - we dont know RDF format of data here and we dont carry about it. For this reason is not necessary in which RDF format (I choosed N3) we add to repository from SPARQL.

If we can load data somewhere (file, SPARQL endpoint,...) - we choose RDF data format for that and process procure the appropriate RDF format.

tomas-knap commented 11 years ago

Jirka,

1) but you can check that there are some triples by some simple query ASK {?s ?p ?o}. Otherwise, you spoil the RDFDataUnit interface with a method nobody except of the extractor will use, which is really bad.

2) Well the data is loaded for our internal purpuso, so there is no need for choice of formats. Moreover, if you put the data to virtuoso named graphs as a result of extraction, there is no sense of thinking about formats. Btw. How does the SPARQL extractor works in case of output data unit being a named graph?

tomesj commented 11 years ago

1) I have method how I find if there are any triples or not - But I need something that say - ignore it and continue or throw exception. There is no other way how to provide it - extract method need this parameter.

2) I dont understand exactly what to ask.

skodapetr commented 11 years ago

1) Can I ask how many triples there are? Can I find out how many triples has been added in extractFromSPARQLEndpoint? (maybe you can return that number)

If I can .. then I can throw exception my self and I do not need a function to do this instead of me, do I?

2) a) As far as I understand it: RDFFormat.N3 says how the data are stored in DataUnits .. but that should be internal and not visible to the user (and for virtuoso it makes no change as it represent the data in in't own way).

tomesj commented 11 years ago

1) Yes - you can find you, how many triples for SPARQL endpoint were added to repository. And than throw exception if you can.

2) RDFFormat.N3 format only say, if which format are data on the way between repository and SPARQL endpoint. DataUnit have data which are keeping in repository for RDF. Internal representation of these data is not visible.

tomas-knap commented 11 years ago

Add 2) Regarding N3 format in the method, the problem is that the method IS in RDFDataUnitHelper, which means that every DPU developer can see such a strange method "extract from sparql to sparql", which needs N3 argument. This is really strange. I think that such method, if needed, should be hidden from DPU developer. But since the format (N3) is internal to that app and it is not needed to expose and must not be exposed outside, I would hide that. This is unnecessary for the user to be able to select the format.

tomas-knap commented 10 years ago

Move extractFromSPARQL methods to the SPARQL extractor module (similarly as for loader). Move the tests. Then add support for cancel to that DPU

tomesj commented 10 years ago

Create separated class SPARQL extractor in SPARQL extractor module

Prepare to close :-)

tomas-knap commented 10 years ago

Check

ghost commented 10 years ago

tomas-knap please close ...

tomas-knap commented 10 years ago

Done by the rdf data unit refactoring