Closed jakubklimek closed 4 months ago
@jakubklimek -- Note that you can drop the ?test a dcat:Dataset .
pattern and run the query against both of your listed endpoints, if you either un-tick the box for "Strict checking of void variables" on the SPARQL query form (as noted in the comments on @231) or insert define sql:signal-void-variables 0
before CONSTRUCT
in your query. (The define
option also works through saved URLs, as on data.gov.cz (query, results) or dev.nkod.opendata.cz (query, results).)
Is there a reason you're using a CONSTRUCT
query to test, instead of a SELECT
? (At a quick glance, the encoding issue appears to happen in both; I just want to be sure I'm not missing something.)
@smalinin @pkleef @iv-an-ru -- Please take a look at this.
@TallTed thanks, I knew there was a workaround for this somewhere.
I used CONSTRUCT
just because that is how I discovered the bug and went on minimizing the example, no other reason.
I ran into this issue even without ENCODE_FOR_URI
. It therefore seems to be contained to CONCAT
. Whenever there is a unicode character used in CONCAT
, the result is badly encoded:
PREFIX dcat: <http://www.w3.org/ns/dcat#>
SELECT ?changed WHERE {
?dataset a dcat:Dataset .
BIND(CONCAT("ě", ?dataset) AS ?changed)
}
LIMIT 1
— produces ěhttps://data.gov.cz/zdroj/datové-sady/https---isdv.upv.cz-opendata-upv-package_show-id-vz20210307diff
while —
PREFIX dcat: <http://www.w3.org/ns/dcat#>
SELECT ?changed WHERE {
?dataset a dcat:Dataset .
BIND(CONCAT("e", ?dataset) AS ?changed)
}
LIMIT 1
— produces ehttps://data.gov.cz/zdroj/datové-sady/https---isdv.upv.cz-opendata-upv-package_show-id-vz20210307diff
(note the first character and then datové-sady
vs datové-sady
)
Still happening in a7b01eced76532f1fa36fdf665f9f836531bdae0
@smalinin @pkleef @iv-an-ru @hughwilliams @openlink -- Any estimate of when this will be investigated, if not resolved? It seems likely to be causing trouble if not blocking a good number of deployments where Unicode is in broader use.
@pkleef any chance of looking into this when you are dealing with unicode related issues? :)
Still happening in 99e4f122c5. @HughWilliams any chance of fixing this? It is a really annoying issue.
We fixed this problem in commit 06ac26454d060339de4fab69a8ef3e27a4abc946 and c7f420a8dc6b1a437a1ef9a37f44ca8192f9786c. Please check out the latest develop/7 branch.
@pkleef Thanks, seems to work fine now.
There is an issue with handling of Unicode characters with combination of SPARQL
CONCAT
andENCODE_FOR_URI
functions.When used like this:
BIND(CONCAT("https://c/é/", ENCODE_FOR_URI("Á")) as ?c)
, the resulting literal ishttps://c/\u00E9/\u00C3\u0081%00
which, when decoded, ishttps://c/é/Ã
, which is wrong.When used like this (note that in the first string in
CONCAT
, I replaceé
withe
:BIND(CONCAT("https://d/e/", ENCODE_FOR_URI("Á")) as ?d)
, the resulting literal ishttps://d/e/\u00C1%00
, which, when decoded, ishttps://d/e/Á
, which is correct. Not sure whether the problem is in theCONCAT
or theENCODE_FOR_URI
function.This query can be run on https://data.gov.cz/sparql or https://dev.nkod.opendata.cz/sparql:
Note that
?test a dcat:Dataset .
is not necessary and can be replaced by anything which matches something in the graph. It could be omitted, but that triggers this 6,5 years old issue: https://github.com/openlink/virtuoso-opensource/issues/231 when run directly on the Virtuoso SPARQL Endpoint.When run in Yasgui (https://api.triplydb.com/s/8E0WDV550), it works even without this.