Virtuoso is a high-performance and scalable Multi-Model RDBMS, Data Integration Middleware, Linked Data Deployment, and HTTP Application Server Platform
While working on a study comparing different approaches to query the neXtProt dataset, I have observed that Virtuoso returns wrong results in some cases.
If I create the tables —
create table nextprot.annotation_bases
(
id varchar not null,
term varchar,
primary key(id)
);
create table nextprot.isoform_cellular_components
(
isoform varchar not null,
annotation varchar not null,
primary key(isoform, annotation)
);
insert into nextprot.annotation_bases values ('NX_Q96RL1-4-AN_Q96RL1_004429', NULL);
insert into nextprot.isoform_cellular_components values ('NX_Q96RL1-4', 'NX_Q96RL1-4-AN_Q96RL1_004429');
— and define the mapping —
xml_set_ns_decl('', 'http://nextprot.org/rdf#', 2);
xml_set_ns_decl('iri', 'http://bioinfo.iocb.cz/rdf/quad-storage/linked-data-view/iri-class/nextprot#', 2);
sparql create iri class iri:isoform "http://nextprot.org/rdf/isoform/%U"(in id varchar not null) option (bijection).;
sparql create iri class iri:annotation "http://nextprot.org/rdf/annotation/%U"(in id varchar not null) option (bijection).;
sparql create iri class iri:terminology "http://nextprot.org/rdf/terminology/%U"(in id varchar not null) option (bijection).;
sparql create quad storage virtrdf:NeXtProtQuadStorage
from DB.nextprot.isoform_cellular_components as isoform_cellular_components
from DB.nextprot.annotation_bases as annotation_bases
{
create virtrdf:nextprot as graph iri ("http://nextprot.org/rdf")
{
iri:annotation(annotation_bases.id)
:term iri:terminology(annotation_bases.term).
iri:isoform(isoform_cellular_components.isoform)
:cellularComponent iri:annotation(isoform_cellular_components.annotation).
}
};
However, this result is wrong, because an empty result should be returned.
This is due to the fact that the SPARQL query is translated on the SQL query —
SELECT __spfi ( 'http://nextprot.org/rdf/terminology/%U' , "s_7_4_t1"."3fd143dc~term") AS "locterm"
FROM (SELECT "s_7_4_t0-int~isoform_cellular_components"."isoform" AS "77cda1f4~isoform", "s_7_4_t0-int~isoform_cellular_components"."annotation" AS "696bb07d~annotation" FROM DB.nextprot.isoform_cellular_components AS "s_7_4_t0-int~isoform_cellular_components"
WHERE
"s_7_4_t0-int~isoform_cellular_components"."isoform" = 'NX_Q96RL1-4' ) AS "s_7_4_t0"
INNER JOIN (SELECT "s_7_4_t1-int~annotation_bases"."id" AS "ee3bb1~id", "s_7_4_t1-int~annotation_bases"."term" AS "3fd143dc~term" FROM DB.nextprot.annotation_bases AS "s_7_4_t1-int~annotation_bases") AS "s_7_4_t1"
ON (
"s_7_4_t0"."696bb07d~annotation" = "s_7_4_t1"."ee3bb1~id")
OPTION (QUIETCAST);
That is incorrect, because the following condition is missing —
WHERE "s_7_4_t1-int~annotation_bases"."term" IS NOT NULL
The correct translation should look like the SQL query —
SELECT __spfi ( 'http://nextprot.org/rdf/terminology/%U' , "s_7_4_t1"."3fd143dc~term") AS "locterm"
FROM (SELECT "s_7_4_t0-int~isoform_cellular_components"."isoform" AS "77cda1f4~isoform", "s_7_4_t0-int~isoform_cellular_components"."annotation" AS "696bb07d~annotation" FROM DB.nextprot.isoform_cellular_components AS "s_7_4_t0-int~isoform_cellular_components"
WHERE
"s_7_4_t0-int~isoform_cellular_components"."isoform" = 'NX_Q96RL1-4' ) AS "s_7_4_t0"
INNER JOIN (SELECT "s_7_4_t1-int~annotation_bases"."id" AS "ee3bb1~id", "s_7_4_t1-int~annotation_bases"."term" AS "3fd143dc~term" FROM DB.nextprot.annotation_bases AS "s_7_4_t1-int~annotation_bases" WHERE "s_7_4_t1-int~annotation_bases"."term" IS NOT NULL) AS "s_7_4_t1"
ON (
"s_7_4_t0"."696bb07d~annotation" = "s_7_4_t1"."ee3bb1~id")
OPTION (QUIETCAST);
While working on a study comparing different approaches to query the neXtProt dataset, I have observed that Virtuoso returns wrong results in some cases.
If I create the tables —
— and define the mapping —
— then the SPARQL query —
— returns the following result —
However, this result is wrong, because an empty result should be returned.
This is due to the fact that the SPARQL query is translated on the SQL query —
That is incorrect, because the following condition is missing —
The correct translation should look like the SQL query —