openlink / virtuoso-opensource

Virtuoso is a high-performance and scalable Multi-Model RDBMS, Data Integration Middleware, Linked Data Deployment, and HTTP Application Server Platform
https://vos.openlinksw.com
Other
867 stars 210 forks source link

Missing IS NOT NULL condition #977

Open galgonek opened 3 years ago

galgonek commented 3 years ago

While working on a study comparing different approaches to query the neXtProt dataset, I have observed that Virtuoso returns wrong results in some cases.

If I create the tables —

create table nextprot.annotation_bases
(
    id                  varchar not null,
    term                varchar,
    primary key(id)
);

create table nextprot.isoform_cellular_components
(
    isoform     varchar not null,
    annotation  varchar not null,
    primary key(isoform, annotation)
);

insert into nextprot.annotation_bases values ('NX_Q96RL1-4-AN_Q96RL1_004429', NULL);
insert into nextprot.isoform_cellular_components values ('NX_Q96RL1-4', 'NX_Q96RL1-4-AN_Q96RL1_004429');

— and define the mapping —

xml_set_ns_decl('', 'http://nextprot.org/rdf#', 2);
xml_set_ns_decl('iri', 'http://bioinfo.iocb.cz/rdf/quad-storage/linked-data-view/iri-class/nextprot#', 2);

sparql create iri class iri:isoform "http://nextprot.org/rdf/isoform/%U"(in id varchar not null) option (bijection).;
sparql create iri class iri:annotation "http://nextprot.org/rdf/annotation/%U"(in id varchar not null) option (bijection).;
sparql create iri class iri:terminology "http://nextprot.org/rdf/terminology/%U"(in id varchar not null) option (bijection).;

sparql create quad storage virtrdf:NeXtProtQuadStorage
    from DB.nextprot.isoform_cellular_components as isoform_cellular_components
    from DB.nextprot.annotation_bases as annotation_bases
{
  create virtrdf:nextprot as graph iri ("http://nextprot.org/rdf")
  {
    iri:annotation(annotation_bases.id)
      :term iri:terminology(annotation_bases.term).

    iri:isoform(isoform_cellular_components.isoform)
      :cellularComponent iri:annotation(isoform_cellular_components.annotation).
  }
};

— then the SPARQL query —

sparql
define input:storage virtrdf:NeXtProtQuadStorage
select ?locterm where {
    <http://nextprot.org/rdf/isoform/NX_Q96RL1-4> :cellularComponent ?loc .
    ?loc :term ?locterm.
};

— returns the following result —

locterm
VARCHAR
_______________________________________________________________________________

http://nextprot.org/rdf/terminology/%28NULL%29

However, this result is wrong, because an empty result should be returned.

This is due to the fact that the SPARQL query is translated on the SQL query —

SELECT __spfi ( 'http://nextprot.org/rdf/terminology/%U' ,  "s_7_4_t1"."3fd143dc~term") AS "locterm"
FROM (SELECT  "s_7_4_t0-int~isoform_cellular_components"."isoform" AS "77cda1f4~isoform",  "s_7_4_t0-int~isoform_cellular_components"."annotation" AS "696bb07d~annotation" FROM DB.nextprot.isoform_cellular_components AS "s_7_4_t0-int~isoform_cellular_components"
     WHERE 
    "s_7_4_t0-int~isoform_cellular_components"."isoform" = 'NX_Q96RL1-4' ) AS "s_7_4_t0"
  INNER JOIN (SELECT  "s_7_4_t1-int~annotation_bases"."id" AS "ee3bb1~id",  "s_7_4_t1-int~annotation_bases"."term" AS "3fd143dc~term" FROM DB.nextprot.annotation_bases AS "s_7_4_t1-int~annotation_bases") AS "s_7_4_t1"
  ON (
    "s_7_4_t0"."696bb07d~annotation" = "s_7_4_t1"."ee3bb1~id")
OPTION (QUIETCAST);

That is incorrect, because the following condition is missing —

WHERE "s_7_4_t1-int~annotation_bases"."term" IS NOT NULL

The correct translation should look like the SQL query —

SELECT __spfi ( 'http://nextprot.org/rdf/terminology/%U' ,  "s_7_4_t1"."3fd143dc~term") AS "locterm"
FROM (SELECT  "s_7_4_t0-int~isoform_cellular_components"."isoform" AS "77cda1f4~isoform",  "s_7_4_t0-int~isoform_cellular_components"."annotation" AS "696bb07d~annotation" FROM DB.nextprot.isoform_cellular_components AS "s_7_4_t0-int~isoform_cellular_components"
     WHERE 
    "s_7_4_t0-int~isoform_cellular_components"."isoform" = 'NX_Q96RL1-4' ) AS "s_7_4_t0"
  INNER JOIN (SELECT  "s_7_4_t1-int~annotation_bases"."id" AS "ee3bb1~id",  "s_7_4_t1-int~annotation_bases"."term" AS "3fd143dc~term" FROM DB.nextprot.annotation_bases AS "s_7_4_t1-int~annotation_bases" WHERE "s_7_4_t1-int~annotation_bases"."term" IS NOT NULL) AS "s_7_4_t1"
  ON (
    "s_7_4_t0"."696bb07d~annotation" = "s_7_4_t1"."ee3bb1~id")
OPTION (QUIETCAST);
openlink commented 3 years ago

Our development team will review this issue and report back as soon as posible.