SPARQL CONSTRUCT chunked inconsistent with SPARQL CONSTRUCT

linkedpipes / etl

LinkedPipes ETL is an RDF based, lightweight ETL tool

https://etl.linkedpipes.com

Other

147 stars 30 forks source link

SPARQL CONSTRUCT chunked inconsistent with SPARQL CONSTRUCT #343

Open jindrichmynarz opened 7 years ago

jindrichmynarz commented 7 years ago

SPARQL CONSTRUCT chunked is inconsistent with SPARQL CONSTRUCT when provided with no input. Given a query that doesn't require input, SPARQL CONSTRUCT produces output, but SPARQL CONSTRUCT chunked produces no output. A pipeline to replicate the issue shows the difference. SPARQL CONSTRUCT chunked should preferably behave the same SPARQL CONSTRUCT.

Tested using the develop version 2cdf1becdb95be5bc78782beab943c81f3763b19.

skodapetr commented 7 years ago

Chunked load each chunk and transform it. So if there is no input no transformation is done.

Regular component transforms graph in input RDF repository. The graph is always created by backend as the "content" is not the graph it self, bu rather the triples inside. The given query is executed in every case.

The difference between representations of the "empty" state is the cause of this issue. This difference comes to play for very chunked/non-chunked component. Thus it may be better to adopt a system solution.

The question is how to represent an "empty" state for chunked and non-chunked components and how it should be handled. Or we can consider this behaviour just as a side effect (it should be documented) of the current design.

jindrichmynarz commented 7 years ago

There are (at least) two options for representing empty state:

Nothing
Empty RDF data unit

Either one if fine with me, but the empty state should be the same for both chunked and non-chunked SPARQL CONSTRUCT.

skodapetr commented 7 years ago

So you basically propose to introduce a check isEmpty and call it before every component and if the input is empty then the component should to nothing? So it would not be possible to run SPARQL construct on empty input any more.

If the input is empty should do component do nothing or fail .. ? Also as the check would need to be integrated for all components (to remain consistent) this would mean to extract this functionality outside the components as a QA service. However this may also cause issues as in some cases the empty input can be ok - for example error processing.

However re-thinking the issue .. it's clear that the described behaviour (inconsistency) can be easily reproduced by single component without input. But can you provide an example of bigger pipeline, where SPARQL constructs are connected to something - to get other scenario? Or is this issue only about the single special case, where there is no input to the SPARQL construct components.

jindrichmynarz commented 7 years ago

No, I don't propose any such check. I think the implementation details are irrelevant at this point. What the issue is about is consistent interface.

Re-thinking this a bit more, both components should produce output even if provided with empty input. There are SPARQL CONSTRUCT queries that don't require input data, such as those for generating data.

An example of a bigger pipeline may be a test for the components that generates data either via SPARQL CONSTRUCT (to test components operating on non-chunked input) or SPARQL CONSTRUCT chunked (to test components operating on chunked input).

jakubklimek commented 7 years ago

I see no point in making the two ways consistent. Regular takes RDF data (and empty graph is OK) and can be used to generate data. Chunked takes chunks. An empty chunk should be OK. But no chunks make no sense and should fail.

jindrichmynarz commented 7 years ago

The regular SPARQL CONSTRUCT takes no data too.

I think SPARQL CONSTRUCT and SPARQL CONSTRUCT chunked should be interchangeable, provided their input has the correct type, so that users can optimize a pipeline by simply replacing the component without worrying about inconsistencies.

jakubklimek commented 7 years ago

@skodapetr Could we run the query on an artificially created empty chunk if there is no chunk provided on the input? This would solve the issue.