openlink / virtuoso-opensource

Virtuoso is a high-performance and scalable Multi-Model RDBMS, Data Integration Middleware, Linked Data Deployment, and HTTP Application Server Platform
http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/
Other
849 stars 214 forks source link

Incorrect "TOP 1" optimization #978

Open galgonek opened 3 years ago

galgonek commented 3 years ago

While working on a study comparing different approaches to query the neXtProt dataset, I have observed that Virtuoso returns wrong results in some cases.

If I create the tables —

create table nextprot.isoform_cellular_components
(
    isoform     varchar not null,
    annotation  varchar not null,
    primary key(isoform, annotation)
);

create table nextprot.annotation_negative_evidences
(
    annotation  varchar not null,
    evidence    varchar not null,
    primary key(annotation, evidence)
);

insert into nextprot.isoform_cellular_components values ('NX_P09958-1', 'NX_P09958-1-AN_P09958_0504');

insert into nextprot.annotation_negative_evidences values ('NX_O00429-2-AN_O00429_001737', '119669768');
insert into nextprot.annotation_negative_evidences values ('NX_P09958-1-AN_P09958_0504', '91773408');

— and define the mapping —

xml_set_ns_decl('', 'http://nextprot.org/rdf#', 2);
xml_set_ns_decl('iri', 'http://bioinfo.iocb.cz/rdf/quad-storage/linked-data-view/iri-class/nextprot#', 2);

sparql create iri class iri:isoform "http://nextprot.org/rdf/isoform/%U"(in id varchar not null) option (bijection).;
sparql create iri class iri:annotation "http://nextprot.org/rdf/annotation/%U"(in id varchar not null) option (bijection).;
sparql create iri class iri:evidence "http://nextprot.org/rdf/evidence/%U"(in id varchar not null) option (bijection).;

sparql create quad storage virtrdf:NeXtProtQuadStorage
    from DB.nextprot.isoform_cellular_components as isoform_cellular_components
    from DB.nextprot.annotation_negative_evidences as annotation_negative_evidences
{
  create virtrdf:nextprot as graph iri ("http://nextprot.org/rdf")
  {
    iri:isoform(isoform_cellular_components.isoform)
      :cellularComponent iri:annotation(isoform_cellular_components.annotation).

    iri:annotation(annotation_negative_evidences.annotation)
      :negativeEvidence iri:evidence(annotation_negative_evidences.evidence).
  }
};

— then the query —

sparql
define input:storage virtrdf:NeXtProtQuadStorage
select ?iso where {
  ?iso :cellularComponent ?loc .
  minus {?loc :negativeEvidence ?negev}
};

— returns the result —

iso
VARCHAR
_______________________________________________________________________________

http://nextprot.org/rdf/isoform/NX_P09958-1

However, this result is wrong, because an empty result should be returned.

This is due to the fact that the SPARQL query is translated on the SQL query —

SELECT __spfi ( 'http://nextprot.org/rdf/isoform/%U' ,  "s_6_6_t1"."77cda1f4~isoform") AS "iso"
FROM (SELECT  "s_6_6_t1-int~isoform_cellular_components"."isoform" AS "77cda1f4~isoform",  "s_6_6_t1-int~isoform_cellular_components"."annotation" AS "696bb07d~annotation" FROM DB.nextprot.isoform_cellular_components AS "s_6_6_t1-int~isoform_cellular_components") AS "s_6_6_t1"
WHERE
  not ( EXISTS ( (
     SELECT TOP 1 1 AS __ask_retval
      FROM (SELECT  TOP 1 "s_6_4_t0-int~annotation_negative_evidences"."annotation" AS "1e998704~annotation",  "s_6_4_t0-int~annotation_negative_evidences"."evidence" AS "3bc5066f~evidence" FROM DB.nextprot.annotation_negative_evidences AS "s_6_4_t0-int~annotation_negative_evidences") AS "s_6_4_t0"
      WHERE
        "s_6_4_t0"."1e998704~annotation" = "s_6_6_t1"."696bb07d~annotation"
OPTION (QUIETCAST)
     )))
OPTION (QUIETCAST);

That is incorrect, because it contains "TOP 1" for the table "annotation_negative_evidences".

The correct translation should look like the following SQL query —

SELECT __spfi ( 'http://nextprot.org/rdf/isoform/%U' ,  "s_6_6_t1"."77cda1f4~isoform") AS "iso"
FROM (SELECT  "s_6_6_t1-int~isoform_cellular_components"."isoform" AS "77cda1f4~isoform",  "s_6_6_t1-int~isoform_cellular_components"."annotation" AS "696bb07d~annotation" FROM DB.nextprot.isoform_cellular_components AS "s_6_6_t1-int~isoform_cellular_components") AS "s_6_6_t1"
WHERE
  not ( EXISTS ( (
     SELECT TOP 1 1 AS __ask_retval
      FROM (SELECT "s_6_4_t0-int~annotation_negative_evidences"."annotation" AS "1e998704~annotation",  "s_6_4_t0-int~annotation_negative_evidences"."evidence" AS "3bc5066f~evidence" FROM DB.nextprot.annotation_negative_evidences AS "s_6_4_t0-int~annotation_negative_evidences") AS "s_6_4_t0"
      WHERE
        "s_6_4_t0"."1e998704~annotation" = "s_6_6_t1"."696bb07d~annotation"
OPTION (QUIETCAST)
     )))
OPTION (QUIETCAST);
openlink commented 3 years ago

Our development team will review this issue and report back as soon as posible.