openlink / virtuoso-opensource

Virtuoso is a high-performance and scalable Multi-Model RDBMS, Data Integration Middleware, Linked Data Deployment, and HTTP Application Server Platform
https://vos.openlinksw.com
Other
868 stars 210 forks source link

Matching strings without xsd:string #728

Open saleem-muhammad opened 6 years ago

saleem-muhammad commented 6 years ago

I am not sure the problem has been reported before or not. I am running Virtuoso 07.20.3217 for Linux as of Dec 15 2017. And encountered the following problem.

I am able to correctly get the results for the given query. SELECT ?T where {?T bb:hasName "Department6Study_Track0"^^xsd:string} However, removing the xsd:string from the query gives me empty results. In RDF 1.1, strings without xsd:string and strings with xsd:string are the same RDF term. It does not matter if you write the ^^xsd:string or not. Any fix for the problem?

HughWilliams commented 6 years ago

The following query shows that a literal string with no type does default to xsd:string:

SQL> SPARQL SELECT ?literal ( datatype(?literal) AS ?type ) 
WHERE { VALUES ?literal { "simple" "typed"^^xsd:string } };
literal          type
LONG VARCHAR     LONG VARCHAR
____________     _______________________________________

simple           http://www.w3.org/2001/XMLSchema#string
typed            http://www.w3.org/2001/XMLSchema#string

2 Rows. -- 2 msec.
SQL> 

Thus please provide steps to reproduce the problem you are experiencing ...

saleem-muhammad commented 6 years ago

Ok consider querying the following single triple pattern --

<http://www.owl-ontologies.com/Ontology1324312315.owl#Semester0>  <http://www.owl-ontologies.com/Ontology1324312315.owl#hasName>  "Semester0"^^<http://www.w3.org/2001/XMLSchema#string> .

The query --

select * where  { ?s <http://www.owl-ontologies.com/Ontology1324312315.owl#hasName> "Semester0"^^xsd:string}

-- gives 1 result. While the query --

select * where  { ?s <http://www.owl-ontologies.com/Ontology1324312315.owl#hasName> "Semester0"}

-- gives zero results.

HughWilliams commented 6 years ago

I can see what you are reporting with that test case (please always use isql to show exactly what you are doing):

SQL> SPARQL INSERT INTO GRAPH <http://example.org> 
{ <http://www.owl-ontologies.com/Ontology1324312315.owl#Semester0> 
<http://www.owl-ontologies.com/Ontology1324312315.owl#hasName>  
"Semester0"^^xsd:string };

Done. -- 84 msec.

SQL> sparql select * 
where { ?s <http://www.owl-ontologies.com/Ontology1324312315.owl#hasName> 
"Semester0"^^xsd:string};
s
LONG VARCHAR
______________________________________________________________

http://www.owl-ontologies.com/Ontology1324312315.owl#Semester0

1 Rows. -- 84 msec.

SQL> sparql select * 
where { ?s <http://www.owl-ontologies.com/Ontology1324312315.owl#hasName> 
"Semester0"};
s
LONG VARCHAR
____________

0 Rows. -- 6 msec.

SQL> sparql select ?o datatype(?o) 
from <http://example.org> 
where {?s ?p ?o};
o                callret-1
LONG VARCHAR     LONG VARCHAR
____________     ___________________________________________________________________

Semester0        http://www.w3.org/2001/XMLSchema#string

1 Rows. -- 1 msec.
SQL>

But then if you insert a triple with a literal string with no datatype specified, its datatype is actually xsd:string by default; it is just not stored with xsd:string physically in the database:

SQL> SPARQL INSERT INTO GRAPH <http://example1.org> 
{ <http://www.owl-ontologies.com/Ontology1324312315.owl#Semester0> 
<http://www.owl-ontologies.com/Ontology1324312315.owl#hasName>  
"Semester0" };

Done. -- 9 msec.

SQL> sparql select ?o ( datatype(?o) as ?datatype ) 
from <http://example1.org> 
where {?s ?p ?o};
o                datatype
LONG VARCHAR     LONG VARCHAR
____________     _______________________________________

Semester0        http://www.w3.org/2001/XMLSchema#string

1 Rows. -- 4 msec.
SQL> 

This being how Virtuoso works ...

saleem-muhammad commented 6 years ago

Now i see the problem. Explicitly specifying xsd:string with literals in the RDF datasets would cause problems in Virtuoso, although RDF allows to do so. There are many RDF datasets that explicitly mention the string type with literals. May be it can be fixed in the next release?

Polymathronic commented 6 years ago

+1. This is a deviation from the RDF standard, and can be a really nasty one for application developers, especially when you don't have full control over your data.

bertvannuffelen commented 6 years ago

+1. It would be great if Virtuoso would be agnostic to xsd:string. RDF1.1 defines plain literals as syntactic sugar for typed literals with type xsd:string. Options that can be taken are:

HughWilliams commented 6 years ago

I have logged an internal ticket for this, such that development can look into it ...

bertvannuffelen commented 6 years ago

thanks Hugh, looking forward to it.

IvanMikhailov commented 6 years ago

So far, Virtuoso is RDF 1.0, and "6.5.1 Literal Equality" of RDF 1.0 states that --

Two literals are equal if and only if all of the following hold:

  • The strings of the two lexical forms compare equal, character by character.
  • Either both or neither have language tags.
  • The language tags, if any, compare equal.
  • Either both or neither have datatype URIs.
  • The two datatype URIs, if any, compare equal, character by character.

RDF 1.0 also permits an implicit cast of XML text-only fragments to xsd:strings:

RDF applications may use additional equivalence relations, such as that which relates an xsd:string with an rdf:XMLLiteral corresponding to a single text node of the same string.

-- but Virtuoso does not support that because it supports generic entities as XML resources (i.e., XMLs with more than one top-level element that are not valid if used as standalone resources but may be valid if included into other resources via DTD).

Your request for migration to RDF 1.1 is first of the sort and it is still alone after 3 months. Technically, it's not a big deal to add a configuration parameter so "abc"^^xsd:string will be treated as "abc" or, alternatively, "abc" is always "abc"^^xsd:string; the first one looks more practical. In addition, it's possible to extend the built-in DATATYPE(?x) function with a second argument that is the value to return if a plain literal is passed as a value of ?x.

bertvannuffelen commented 6 years ago

Hi Ivan,

thanks for taking care for this request. More and more tools en libraries in the RDF ecosystem apply RDF1.1. We face now difficulties to when combining RDF1.0 and RDF1.1 tools. It is good that we get on this topic more alignment.

Is there anything we can do ourselves of your suggestion?

yoadey commented 4 years ago

+1

nam-vuhoang commented 3 years ago

+1

IS4Code commented 3 years ago

Would love to see a solution to this.

datamusee commented 3 years ago

Hello In my work, I need to use <SERVICE <http://fr.dbpedia.org/sparql> to find data relative to strings in my dataset. My sparql query is processed with Fuseki. For now, I can't find a way to compare my strings to the results in dbpedia My target is, for example: <http://fr.dbpedia.org/resource/La_Rochelle> <http://dbpedia.org/ontology/postalCode> "17000"^^<http://www.w3.org/2001/XMLSchema#string> I suspect that fuseki produce a value without explicit type and gives me no mean to add an xsd:string type, then Virtuoso/DBpedia is unable to find a match. Here is a sample query

SELECT distinct ?scode  ?sd 
where{
    bind("17300"^^<http://www.w3.org/2001/XMLSchema#string> as ?scode)
    SERVICE <http://fr.dbpedia.org/sparql> {
      select  ?sd ?scode where {
        ?sd <http://dbpedia.org/ontology/inseeCode> ?scode .
      }    
  }
}

If I try a similar query with values of integer type, the result is as expected.

SELECT distinct ?scode  ?sd 
where{
    bind(17300 as ?scode)
    SERVICE <http://fr.dbpedia.org/sparql> {
      select  ?sd ?scode where {
        ?sd <http://fr.dbpedia.org/property/insee> ?scode .
      }    
  }
}
TallTed commented 3 years ago

Note that the way your queries are constructed, the SERVICE clause is the "innermost" subquery, and that gets evaluated before the "outer" subqueries. This is because SPARQL is evaluated "from the inside out" (sometimes confusingly called "from the bottom up", leading people to think evaluation starts at the lexical bottom of the query).

Also note that putting the DISTINCT on the "outer" query means you may pull a lot more data over the wire than necessary. Therefore, I've moved the DISTINCT to the inner query.

See what happens if you run this --

SELECT ?scode  ?sd 
WHERE 
  {
    SERVICE <http://fr.dbpedia.org/sparql> 
      {
        SELECT DISTINCT ?sd ?scode
        WHERE 
          {
            ?sd  <http://fr.dbpedia.org/property/insee>  ?scode .
            BIND ( 17300 AS ?scode )
          }    
      }
  }

-- or this --

SELECT ?scode  ?sd 
WHERE 
  {
    SERVICE <http://fr.dbpedia.org/sparql> 
      {
        SELECT DISTINCT ?sd ?scode
        WHERE 
          {
            ?sd  <http://dbpedia.org/ontology/inseeCode>  ?scode .
            BIND ( "17300"^^<http://www.w3.org/2001/XMLSchema#string> AS ?scode )
          }    
      }
  }

I think that this will not be quite sufficient to reach your actual goal, as I think you've come to us with an "XY Problem". If I'm right, perhaps you can provide us with something of the bigger picture?

datamusee commented 3 years ago

The bind is outside the service because it is here just as a sample code to set ?scode from the "local" dataset. The real pattern/query is more complex and set ?scode by querying the local dataset, then go to the dbpedia service to get some complementary data.

TallTed commented 3 years ago

I believe I understand what you're trying to do.

The "local" subquery that gives the partial results that are then meant to be used against DBpedia must be executed before the remote SERVICE subquery.

This means that the "local" subquery must be "lower" or more "inner" than the remote SERVICE subquery.

If you provide your actual query, we can provide a suggested SPARQL rewrite.

Alternatively, you could use whatever tooling you're using outside the SPARQL to execute two queries -- one to get the "local" values, which are then used in building the second, which gets the "remote" data.

JJ-Author commented 9 months ago

I developed a potential workaround to setup Virtuoso such that it is able to match simple literal constants with or without ^^xsd:string in a triple pattern of a SPARQL query and thus making Virtuoso a little more compatible to RDF 1.1. This topic is getting interesting again given that Jena 5 SPARQL API seems to remove the RDF1.0 compatibility mode.

I had a look at execution plans and SQL functions and also played a little bit around as a simple proof of concept. I found out that DB.DBA.RDF_TWOBYTE_OF_DATATYPE is responsible for determining the internal datatype in the rdf_box for a datatype IRI. So I did something hacky on isql

1) restart Virtuoso 2) create function via isql

   CREATE function ChangeStringDatatype()
   { 
    DECLARE str_dt_2byte INT;

    str_dt_2byte := DB.DBA.RDF_TWOBYTE_OF_DATATYPE(DB.DBA.RDF_MAKE_IID_OF_QNAME('http://www.w3.org/2001/XMLSchema#string'));
   update
   DB.DBA.RDF_DATATYPE
   SET RDT_TWOBYTE=257
   Where RDT_IID=iri_to_id('http://www.w3.org/2001/XMLSchema#string');

   update
   DB.DBA.RDF_DATATYPE
   SET RDT_TWOBYTE=str_dt_2byte 
   Where RDT_TWOBYTE=DB.DBA.RDF_TWOBYTE_OF_DATATYPE(DB.DBA.RDF_MAKE_IID_OF_QNAME('http://www.w3.org/2001/XMLSchema#stringSurrogate'));
   };

3) run ChangeStringDatatype() via isql (only one time!) 4) restart Virtuoso 5) delete the function

The function ties the xsd:string datatype to an internal twobyte datatype default identifier (value 257) that is used for "simple literals" (so literal without datatype); as a consequence, I can query all "simple literals" in Virtuoso by using either xsd:string or without datatype.

New triples of type xsd:string will automatically be inserted in that "simple literal" Virtuoso type and can as a result be queried in both ways, too.

As a consequence however, every triple that was explicitly typed and loaded as xsd:string before that "patch" cannot be queried via xsd:string anymore but with the xsd:stringSurrogate type that was created in above function.

So these triples need to be converted in order to query them in a more meaningful way and not break existing SPARQL queries (in fact, these triples need to be converted to simple literals, or better deleted and then loaded again — in general, its probably better to apply the hack to an empty database and then load the data from scratch — on (re)loading everyxsd:string`-typed literal should be automatically converted into a simple literal).

CAUTION: I don't know whether this has other unintentional side effects (sort order, etc.) than "casting" to xsd:string now gives (simple) literal as datatype in, e.g., JSON/XML-result set (type field). I would refrain from doing this on productive systems or without performing a backup of the database.

Nevertheless, I share it here with the intention that people can comment on side-effects, or maybe even improve this.

Aklakan commented 9 months ago

[...] given that Jena 5 SPARQL API seems to remove the RDF1.0 compatibility mode.

Here's the reference:

https://github.com/apache/jena/issues/2020

Remove partial, incomplete RDF 1.0 support