ontop / ontop

Ontop is a platform to query relational databases as Virtual RDF Knowledge Graphs using SPARQL
https://ontop-vkg.org
Apache License 2.0
652 stars 163 forks source link

R2RML & Oracle DB: boolean result types always false #290

Open yoadey opened 5 years ago

yoadey commented 5 years ago

Description

Oracle does not have a standard Boolean column, which returns the value true or false. So often columns with char Y/N or bumber 1/0 are used. However, these values are always mapped to false, as Boolean value is parsed with Boolean.parseBoolean(columnContent), which only checks for value.equalsIgnoreCase("true"); I'm aware that this is an Oracle specific setup and there is no standard. However, it is currently impossible to manipulate the mapping process to use custom/own mappings (which would be my preferred solution).

Steps to Reproduce

  1. Create a table with an ID and a number column with values 1 and 0 (or 'Y' and 'N')
  2. Create an r2rml file which converts the column to boolean
  3. Run the r2rml file and write the results to a turtle

Expected behavior: Columns with value 0 would produce a triple with false and 1 would be converted to true Actual behavior: All triples are false

Reproduces how often: always

Versions

Current head of ontop.

bcogrel commented 5 years ago

Hello,

Thanks a lot for the suggestion. This Oracle specific case is indeed not covered by the R2RML standard, however I think it makes sense to add an extra normalization for the case where the boolean value is actually stored in a NUMBER column.

Actually your issue comes at the right time as we are currently refactoring the Oracle "connector" in an experimental branch (https://github.com/ontop/ontop/tree/feature/native-node). Boolean values stored in a NUMBER column should now be normalized into true/false.

The encoding of boolean into string like y/n is much more problematic as it is language-specific (o/n in French, j/n in German, s/n in Italian or Spanish, and so on).

My suggestion would be to use a CASE in the mapping assertion that would convert the string y/n into a true/false string.

SELECT CASE WHEN b = 'n' THEN 'false' ELSE 'true' END AS v
FROM myTable
WHERE b IS NOT NULL

Note that is essential to filter out null values of b as otherwise NULLs would be interpreted as true values.

At the moment CASE expressions are currently not recognized by the Ontop SQL parser, causing the SQL query to be handled as a "black-box" view. While semantically correct, such a view prevents many optimizations to be applied and may therefore penalize the performance.

Best, Benjamin

yoadey commented 5 years ago

Thank you very much, with the CASE statement this would solve the issues. However, it would also be useful if a custom mapper could be applied somehow, to exchange the JDBC2ConstantConverter, perhaps already in the configuration in the beginning with a statement like:

OntopStandaloneSQLConfiguration.defaultBuilder() //
                .enableDefaultDatatypeInference(true)//
                .r2rmlMappingGraph(graph) //
                .mappingConverter(new MyCustomConverter()) //
                ...

Then each user could handle special cases themselve, would this be possible?

bcogrel commented 4 years ago

Hi,

CASE expressions are now supported by the SQL parser, it would be interested to test how Ontop now behaves with such a mapping.

Regarding your proposal, in terms of design, we would prefer to stay as declarative as possible and avoid custom code. We now recommend users to use the Docker image or the CLI rather than the Java API. Using the CASE expression to the mapping also brings interoperability, as it relies on standards rather than on Ontop-specific features.

Best, Benjamin