Open tirrolo opened 3 years ago
Hi @tirrolo ,
That's an interesting problem.
According to R2RML, section 7.3, if an argument of a template is null, the template value becomes null, whether it is an IRI template or something else.
Do you know if the Direct Mapping specification suggests something for this situation?
One first idea from my side would be to use a COALESCE in the source part of the mapping assertion so as to replace the null value by a default value (e.g. the 'null' string). This may be imperfect since multiple rows could produce the same bnode, which I don't know if it is tolerated by the Direct Mapping specification or not.
Best, Benjamin
Hi Benjamin, Davide
The Direct Mapping specification (https://www.w3.org/TR/rdb-direct-mapping/#no-pk) does not prescribe any way of creating blank nodes (no template), it merely says that each row of the table should get a fresh blank node assigned to it. This makes perfect sense, but it means that we need some sort of surrogate ID in that case. However, all attributes, as Davide highlights, cannot be used as such an ID - a row number is required instead, but that makes it difficult to support it in general. DBMSs use various ways of producing row IDs, and keeping those IDs consistent for references (https://www.w3.org/TR/rdb-direct-mapping/#ref-no-pk) may even be impossible.
A practical solution could be using COALESCE(TYPE2TXT(column), 'NULL'), but that will be close to the weird treatment of NULLs in early MS SQL Server, where all NULLs in any given column were "equal" (so, a UNIQUE nullable column could contain at most one NULL).
Best Roman
Hi,
Thanks Roman for the explanation. We encounter here the same difficulties as with the request for supporting anonymous blank nodes in the mapping (#372, not part of R2RML).
Unfortunately, the practical solution based on COALESCE we are having in mind does not comply with the Direct Mapping specification. The approach of row ids seems to be a better way, but indeed hard to support as DBMSs handle them in very different ways, when they provide this feature.
Best, Benjamin
Following our online discussion this morning, the Ontop bootstrapper does and will not align itself with the Direct Mapping spec, but with the default mapping mechanism of R2RML (https://www.w3.org/TR/r2rml/#default-mappings). We should made explicit that duplicate rows are not preserved.
In that setting, it should be OK to replace nulls by default values using coalesce. That will be the responsibility of the bootstrapper, not of the R2RML mapping processor. The choice of default values should be made configurable.
One idea of having one single default value to specify: casting all arguments of the template to string before applying the coalesce.
That's precisely what I wrote above, with COALESCE(TYPE2TXT(column), 'NULL'). I mentioned it's weird, but actually it's not that bad: the template will contain all columns, so, the only effect of this COALESCE is indeed elimination of duplicate rows.
To properly deal with duplicate rows, we'll need to extend the SQLParser to understand the "pseudo-columns" like ROWID (which will make it more dependant on the SQL dialect, which is probably inevitable at some stage anyway).
Yes guys, I can handle this in the way you suggest (i.e., ignore rowids) in the new bootstrapper branch, because I have a configuration file there. However I am not sure at all we will eventually want to merge that into Ontop: that bootstrapper is getting quite complex and for such reason I was in fact thinking of taking it out of the Ontop codebase. For sure, before considering any merging, it requires a substantial refactoring effort (I have spent the latest months in rushing adding new features, and as a result the code looks ugly).
Description
Whenever a table without primary key is encountered, the Ontop bootstrapper will generate a bnode template comprising all attributes of the table. However, if such table contain null values, then this means that no output rdf triple will be generated by ontop materialize with respect to rows containing nulls (this seems to follow the mechanics adopted for URIs, that cannot be constructed out of null values... however bnodes templates should be treated differently than URI templates, since bnodes are not URIs).
Steps to Reproduce
1) Create the DB "bootstrapper-tests-bnode" (already on obdalin) with the following definition:
Bootstrap the ontology and mappings. For your convenience, here they are:
Ontology:
Mapping:
3) Run Ontop materialize:
./ontop materialize -p onto/bnodes/bnodes.properties -m onto/bnodes/bnodes.obda -t onto/bnodes/bnodes.owl -o bnodes.ttl
Expected behavior: Some non-empty output
Actual behavior: Empty RDF file.
Reproduces how often: Always
Attached material
Everything is provided above.
Versions
Ontop CLI 4.0.3
Additional Information
Any additional information, configuration or data that might be necessary to reproduce the issue.