openlink / virtuoso-opensource

Virtuoso is a high-performance and scalable Multi-Model RDBMS, Data Integration Middleware, Linked Data Deployment, and HTTP Application Server Platform
https://vos.openlinksw.com
Other
867 stars 210 forks source link

Wrong behaviour of SPARQL DISTINCT #979

Open galgonek opened 3 years ago

galgonek commented 3 years ago

While working on a study comparing different approaches to query the neXtProt dataset, I have observed that Virtuoso returns wrong results in some cases.

If I create the table —

create table nextprot.entry_bases
(
    id     varchar not null,
    primary key(id)
);

insert into nextprot.entry_bases values ('NX_Q15365');

— and define the mapping —

xml_set_ns_decl('', 'http://nextprot.org/rdf#', 2);
xml_set_ns_decl('iri', 'http://bioinfo.iocb.cz/rdf/quad-storage/linked-data-view/iri-class/nextprot#', 2);

sparql create iri class iri:entry "http://nextprot.org/rdf/entry/%U"(in id varchar not null) option (bijection).;

sparql create quad storage virtrdf:NeXtProtQuadStorage
    from DB.nextprot.entry_bases as entry_bases
{
  create virtrdf:nextprot as graph iri ("http://nextprot.org/rdf")
  {
    iri:entry(entry_bases.id)
      rdf:type :Entry.
  }
};

— then the SPARQL query —

sparql
define input:storage virtrdf:NeXtProtQuadStorage
select distinct ?entry where {
  ?entry rdf:type :Entry.
};

— does not return a full IRI, but only fragment 'NX_Q15365' is returned.

As a workaround, it is possible to use the following query —

sparql
define input:storage virtrdf:NeXtProtQuadStorage
select * where {{
  select distinct ?entry where {
    ?entry rdf:type :Entry.
  }
}};

— that returns http://nextprot.org/rdf/entry/NX_Q15365 as expected.

openlink commented 3 years ago

Our development team will review this issue and report back as soon as posible.