kg-construct / rml-core

RML-Core: Main features for RDF generation with RML
https://w3id.org/rml/core/spec
Creative Commons Attribution 4.0 International
12 stars 8 forks source link

RML-Core test cases are too dependent on RML-IO #87

Open DylanVanAssche opened 7 months ago

DylanVanAssche commented 7 months ago

Problem

Engines implementing RML-Core should no bother with all different Source descriptions like CSV, XML, JSON, RDB, SPARQL, etc. to be RML-Core compliant. However, the current test-cases exist in the different Source descriptions. Thus if an engine would cover RML-Core and do not support a certain format, it's coverage would drastically fall, even though it may have perfect RML-Core support. Moreover, different source support is out of scope of RML-Core as it is part of RML-IO.

Proposal

Drop all source specific test-cases in RML-Core and add the different sources to RML-IO. RML-IO currently focus on RML Logical Target tests, Logical Source is missing as it is covered by RML-Core. Keep the CSV variant for all test-cases in RML-Core because we cannot test anything without input data. CSV is the easiest to support (no iterator) and can be loaded easily into a RDB. For RDB support loading the CSV + updating the Logical Source suffice. Special features like datatype extraction from RDBs and possible other formats like integer, floats in JSON, could be added as specific test-cases in RML-IO.

Discussion

Let's discuss this properly! This is not a blocker for the KGCW Challenge as it does not involve a specification change, only a move and refactoring of the test-cases. Engines supporting RML-Core and RML-IO should still have the same coverage like now.

chrdebru commented 6 months ago

I mostly agree. That said, we should have some JSON or XML cases for multi-valued expression maps. That is part of core, right?

bjdmeest commented 6 months ago

yes I agree too, but then to also keep some JSON or XML cases like @chrdebru proposed, I'm also thinking about, e.g. default datatypes when the data source has data types defined (e.g. JSON boolean). I'm guessing probably only JSON is enough, just to make the minimum as minimum as possible

DylanVanAssche commented 6 months ago

+1 to keep these things because the Core spec mentions the data type extraction stuff. So keep CSV test-cases + add/keep a few with CSV-XML-JSON-RDB for data type extraction?