Open saleem-muhammad opened 6 years ago
The following query shows that a literal string with no type does default to xsd:string
:
SQL> SPARQL SELECT ?literal ( datatype(?literal) AS ?type )
WHERE { VALUES ?literal { "simple" "typed"^^xsd:string } };
literal type
LONG VARCHAR LONG VARCHAR
____________ _______________________________________
simple http://www.w3.org/2001/XMLSchema#string
typed http://www.w3.org/2001/XMLSchema#string
2 Rows. -- 2 msec.
SQL>
Thus please provide steps to reproduce the problem you are experiencing ...
Ok consider querying the following single triple pattern --
<http://www.owl-ontologies.com/Ontology1324312315.owl#Semester0> <http://www.owl-ontologies.com/Ontology1324312315.owl#hasName> "Semester0"^^<http://www.w3.org/2001/XMLSchema#string> .
The query --
select * where { ?s <http://www.owl-ontologies.com/Ontology1324312315.owl#hasName> "Semester0"^^xsd:string}
-- gives 1 result. While the query --
select * where { ?s <http://www.owl-ontologies.com/Ontology1324312315.owl#hasName> "Semester0"}
-- gives zero results.
I can see what you are reporting with that test case (please always use isql to show exactly what you are doing):
SQL> SPARQL INSERT INTO GRAPH <http://example.org>
{ <http://www.owl-ontologies.com/Ontology1324312315.owl#Semester0>
<http://www.owl-ontologies.com/Ontology1324312315.owl#hasName>
"Semester0"^^xsd:string };
Done. -- 84 msec.
SQL> sparql select *
where { ?s <http://www.owl-ontologies.com/Ontology1324312315.owl#hasName>
"Semester0"^^xsd:string};
s
LONG VARCHAR
______________________________________________________________
http://www.owl-ontologies.com/Ontology1324312315.owl#Semester0
1 Rows. -- 84 msec.
SQL> sparql select *
where { ?s <http://www.owl-ontologies.com/Ontology1324312315.owl#hasName>
"Semester0"};
s
LONG VARCHAR
____________
0 Rows. -- 6 msec.
SQL> sparql select ?o datatype(?o)
from <http://example.org>
where {?s ?p ?o};
o callret-1
LONG VARCHAR LONG VARCHAR
____________ ___________________________________________________________________
Semester0 http://www.w3.org/2001/XMLSchema#string
1 Rows. -- 1 msec.
SQL>
But then if you insert a triple with a literal string with no datatype specified, its datatype is actually xsd:string
by default; it is just not stored with xsd:string
physically in the database:
SQL> SPARQL INSERT INTO GRAPH <http://example1.org>
{ <http://www.owl-ontologies.com/Ontology1324312315.owl#Semester0>
<http://www.owl-ontologies.com/Ontology1324312315.owl#hasName>
"Semester0" };
Done. -- 9 msec.
SQL> sparql select ?o ( datatype(?o) as ?datatype )
from <http://example1.org>
where {?s ?p ?o};
o datatype
LONG VARCHAR LONG VARCHAR
____________ _______________________________________
Semester0 http://www.w3.org/2001/XMLSchema#string
1 Rows. -- 4 msec.
SQL>
This being how Virtuoso works ...
Now i see the problem. Explicitly specifying xsd:string
with literals in the RDF datasets would cause problems in Virtuoso, although RDF allows to do so. There are many RDF datasets that explicitly mention the string type with literals. May be it can be fixed in the next release?
+1. This is a deviation from the RDF standard, and can be a really nasty one for application developers, especially when you don't have full control over your data.
+1. It would be great if Virtuoso would be agnostic to xsd:string
. RDF1.1 defines plain literals as syntactic sugar for typed literals with type xsd:string
. Options that can be taken are:
xsd:string
for plain literals (internally in the DB all are explicitly typed).xsd:string
to plain literals in the query I have logged an internal ticket for this, such that development can look into it ...
thanks Hugh, looking forward to it.
So far, Virtuoso is RDF 1.0, and "6.5.1 Literal Equality" of RDF 1.0 states that --
Two literals are equal if and only if all of the following hold:
- The strings of the two lexical forms compare equal, character by character.
- Either both or neither have language tags.
- The language tags, if any, compare equal.
- Either both or neither have datatype URIs.
- The two datatype URIs, if any, compare equal, character by character.
RDF 1.0 also permits an implicit cast of XML text-only fragments to xsd:strings
:
RDF applications may use additional equivalence relations, such as that which relates an xsd:string with an rdf:XMLLiteral corresponding to a single text node of the same string.
-- but Virtuoso does not support that because it supports generic entities as XML resources (i.e., XMLs with more than one top-level element that are not valid if used as standalone resources but may be valid if included into other resources via DTD).
Your request for migration to RDF 1.1 is first of the sort and it is still alone after 3 months. Technically, it's not a big deal to add a configuration parameter so "abc"^^xsd:string
will be treated as "abc"
or, alternatively, "abc"
is always "abc"^^xsd:string
; the first one looks more practical. In addition, it's possible to extend the built-in DATATYPE(?x)
function with a second argument that is the value to return if a plain literal is passed as a value of ?x
.
Hi Ivan,
thanks for taking care for this request. More and more tools en libraries in the RDF ecosystem apply RDF1.1. We face now difficulties to when combining RDF1.0 and RDF1.1 tools. It is good that we get on this topic more alignment.
Is there anything we can do ourselves of your suggestion?
+1
+1
Would love to see a solution to this.
Hello
In my work, I need to use <SERVICE <http://fr.dbpedia.org/sparql>
to find data relative to strings in my dataset.
My sparql query is processed with Fuseki. For now, I can't find a way to compare my strings to the results in dbpedia
My target is, for example:
<http://fr.dbpedia.org/resource/La_Rochelle> <http://dbpedia.org/ontology/postalCode> "17000"^^<http://www.w3.org/2001/XMLSchema#string>
I suspect that fuseki produce a value without explicit type and gives me no mean to add an xsd:string type, then Virtuoso/DBpedia is unable to find a match.
Here is a sample query
SELECT distinct ?scode ?sd
where{
bind("17300"^^<http://www.w3.org/2001/XMLSchema#string> as ?scode)
SERVICE <http://fr.dbpedia.org/sparql> {
select ?sd ?scode where {
?sd <http://dbpedia.org/ontology/inseeCode> ?scode .
}
}
}
If I try a similar query with values of integer type, the result is as expected.
SELECT distinct ?scode ?sd
where{
bind(17300 as ?scode)
SERVICE <http://fr.dbpedia.org/sparql> {
select ?sd ?scode where {
?sd <http://fr.dbpedia.org/property/insee> ?scode .
}
}
}
Note that the way your queries are constructed, the SERVICE clause is the "innermost" subquery, and that gets evaluated before the "outer" subqueries. This is because SPARQL is evaluated "from the inside out" (sometimes confusingly called "from the bottom up", leading people to think evaluation starts at the lexical bottom of the query).
Also note that putting the DISTINCT
on the "outer" query means you may pull a lot more data over the wire than necessary. Therefore, I've moved the DISTINCT
to the inner query.
See what happens if you run this --
SELECT ?scode ?sd
WHERE
{
SERVICE <http://fr.dbpedia.org/sparql>
{
SELECT DISTINCT ?sd ?scode
WHERE
{
?sd <http://fr.dbpedia.org/property/insee> ?scode .
BIND ( 17300 AS ?scode )
}
}
}
-- or this --
SELECT ?scode ?sd
WHERE
{
SERVICE <http://fr.dbpedia.org/sparql>
{
SELECT DISTINCT ?sd ?scode
WHERE
{
?sd <http://dbpedia.org/ontology/inseeCode> ?scode .
BIND ( "17300"^^<http://www.w3.org/2001/XMLSchema#string> AS ?scode )
}
}
}
I think that this will not be quite sufficient to reach your actual goal, as I think you've come to us with an "XY Problem". If I'm right, perhaps you can provide us with something of the bigger picture?
The bind
is outside the service
because it is here just as a sample code to set ?scode
from the "local" dataset. The real pattern/query is more complex and set ?scode
by querying the local dataset, then go to the dbpedia service to get some complementary data.
I believe I understand what you're trying to do.
The "local" subquery that gives the partial results that are then meant to be used against DBpedia must be executed before the remote SERVICE
subquery.
This means that the "local" subquery must be "lower" or more "inner" than the remote SERVICE
subquery.
If you provide your actual query, we can provide a suggested SPARQL rewrite.
Alternatively, you could use whatever tooling you're using outside the SPARQL to execute two queries -- one to get the "local" values, which are then used in building the second, which gets the "remote" data.
I developed a potential workaround to setup Virtuoso such that it is able to match simple literal constants with or without ^^xsd:string
in a triple pattern of a SPARQL query and thus making Virtuoso a little more compatible to RDF 1.1. This topic is getting interesting again given that Jena 5 SPARQL API seems to remove the RDF1.0 compatibility mode.
I had a look at execution plans and SQL functions and also played a little bit around as a simple proof of concept.
I found out that DB.DBA.RDF_TWOBYTE_OF_DATATYPE
is responsible for determining the internal datatype in the rdf_box
for a datatype IRI.
So I did something hacky on isql
1) restart Virtuoso 2) create function via isql
CREATE function ChangeStringDatatype()
{
DECLARE str_dt_2byte INT;
str_dt_2byte := DB.DBA.RDF_TWOBYTE_OF_DATATYPE(DB.DBA.RDF_MAKE_IID_OF_QNAME('http://www.w3.org/2001/XMLSchema#string'));
update
DB.DBA.RDF_DATATYPE
SET RDT_TWOBYTE=257
Where RDT_IID=iri_to_id('http://www.w3.org/2001/XMLSchema#string');
update
DB.DBA.RDF_DATATYPE
SET RDT_TWOBYTE=str_dt_2byte
Where RDT_TWOBYTE=DB.DBA.RDF_TWOBYTE_OF_DATATYPE(DB.DBA.RDF_MAKE_IID_OF_QNAME('http://www.w3.org/2001/XMLSchema#stringSurrogate'));
};
3) run ChangeStringDatatype()
via isql
(only one time!)
4) restart Virtuoso
5) delete the function
The function ties the xsd:string
datatype to an internal twobyte datatype default identifier (value 257
) that is used for "simple literals" (so literal without datatype);
as a consequence, I can query all "simple literals" in Virtuoso by using either xsd:string
or without datatype.
New triples of type xsd:string
will automatically be inserted in that "simple literal" Virtuoso type and can as a result be queried in both ways, too.
As a consequence however, every triple that was explicitly typed and loaded as xsd:string
before that "patch" cannot be queried via xsd:string
anymore but with the xsd:stringSurrogate
type that was created in above function.
So these triples need to be converted in order to query them in a more meaningful way and not break existing SPARQL queries (in fact, these triples need to be converted to simple literals, or better deleted and then loaded again — in general, its probably better to apply the hack to an empty database and then load the data from scratch — on (re)loading every
xsd:string`-typed literal should be automatically converted into a simple literal).
CAUTION: I don't know whether this has other unintentional side effects (sort order, etc.) than "casting" to xsd:string
now gives (simple) literal as datatype in, e.g., JSON/XML-result set (type field). I would refrain from doing this on productive systems or without performing a backup of the database.
Nevertheless, I share it here with the intention that people can comment on side-effects, or maybe even improve this.
[...] given that Jena 5 SPARQL API seems to remove the RDF1.0 compatibility mode.
Here's the reference:
https://github.com/apache/jena/issues/2020
Remove partial, incomplete RDF 1.0 support
I am not sure the problem has been reported before or not. I am running Virtuoso 07.20.3217 for Linux as of Dec 15 2017. And encountered the following problem.
I am able to correctly get the results for the given query.
SELECT ?T where {?T bb:hasName "Department6Study_Track0"^^xsd:string}
However, removing thexsd:string
from the query gives me empty results. In RDF 1.1, strings withoutxsd:string
and strings withxsd:string
are the same RDF term. It does not matter if you write the^^xsd:string
or not. Any fix for the problem?