Open joernhees opened 9 years ago
Digging up an old issue..
I am wondering how the encoding actually works. I have a value in which a single ö
is turned into %FFC3%FFB6
.
The command I used is:
isql 1111 csv=on exec='SPARQL SELECT ?s ?o FROM <some-graph> WHERE { ?s rdfs:label ?o }'
Not using the csv=on
produces the ö
just fine:
isql 1111 exec='SPARQL SELECT ?s ?o FROM <some-graph> WHERE { ?s rdfs:label ?o }'
The "isql" SET CSV_RFC4180 ON;
command is the correct way to enable CSV results output:
SQL> SET CSV_RFC4180 ON;
SQL> SPARQL SELECT * WHERE {?s ?p ?o} LIMIT 2;
"s","p","o"
"http://www.w3.org/1999/02/22-rdf-syntax-ns#type","http://www.w3.org/1999/02/22-rdf-syntax-ns#type","http://www.w3.org/1999/02/22-rdf-syntax-ns#Property"
"http://www.w3.org/1999/02/22-rdf-syntax-ns#type","http://www.w3.org/1999/02/22-rdf-syntax-ns#type","http://www.w3.org/1999/02/22-rdf-syntax-ns#Property"
2 Rows. -- 81 msec.
SQL>
That doesn't seem to affect the encoding:
/opt/sparqling-genomics/bin/isql 1111 verbose=off csv_rfc4180=on csv_rfc4180_field_separator=, exec='SPARQL SELECT ?s ?o FROM <some-graph> WHERE { ?s rdfs:label ?o }'
Produces the same %FFC3%FFB6
.
We shall look into this encoding issue in "isql" ...
Thanks a lot!
Perhaps an easy way to avoid breaking anything is to introduce a no_encode
option in binsrc/tests/isql.c
and use field_print_normal
in field_print_csv_rfc4180
when no_encode
is set to on
.
If you'd like I could prepare a patch that implements this.
@roelj: patch contributions are always welcomed ...
Just letting you know that my initial idea does not seem to have any effect: https://github.com/roelj/virtuoso-opensource/commit/4908939dab82bde9bb4f60592d38a047f7220a2a. So I'm working on a second version of the patch.
Hi
I too am bringing up this very old issue again because today (in 2024) I'm facing the same problem with an isql SELECT in command line:
/usr/local/virtuoso-opensource-7.2.9/bin/isql 1111 dba dba exec='CSV_RFC4180=ON exec='SPARQL SELECT (…long query!...) ;' > result.csv
CSV_RFC4180 gives a properly formatted CSV file but causes this kind of bad encoding:
Universit%FFC3%FFA9 de Bourgogne
Development of electrochemical biosensor based on CNT%FFE2%FF80%FF93Fe3O4 nanocomposite …
This makes the output unusable. Without CSV_RFC4180, the encoding is correct, but the file is no longer in CSV format... Version used:Virtuoso 07.20.3217 from February 2017 (Open-Source edition).
Has this problem been fixed ever since ? If so, what is the correct way to proceed? Thank you
@mjeulin — I am not immediately aware of any change specifically related to your reported issue in the seven years since shipment of the version(s?) you're running. Nonetheless, I would strongly advise updating to a current build, because there have been hundreds of code changes in that time, including various bug fixes, performance boosts, and feature enhancements, from all of which all users will benefit.
I note that you reported using Virtuoso 07.20.3217 from February 2017
(which was branded as virtuoso-opensource-7.2.4.2
), but your installation directory shows the much younger (though still rather old, as software ages!) virtuoso-opensource-7.2.9
which dates from February 2023. I wonder whether you have a mix of components from these (and possibly other) distributions, which component version mixes are untested and therefore could lead to any number of odd experiences.
If the issue described here persists in your testing with current components, you can help us deliver a resolution by providing step-by-step instruction for our own local reproduction. It probably makes sense to make such an updated report in a fresh issue, to avoid any confusion with details from the other deployments discussed here in #421.
Thank you for your answer. There is indeed a dual installation on this server, so my requests may actually only calling version 7.2.4.2... The issue may be resolved in a near future with a cleaner re-installation. If not, I will get back (on a new issue).
@TallTed maybe close this if fixed or as wont't fix then? (should be easy enough to check / reproduce on a current version for you?)
looking at this and knowing the various encoding screw-ups that happen in the many system with the "ö
" in my name, i guess the csv is returning urlencoded unicode or utf-8 codepoints 🤷♂️ ?
@joernhees — Something like this issue persists. The %
comes out as desired, but the ö
becomes the very strange %FFF6
.
I've put a fresh install of VOS 7.1.13 (latest as of today) (that is, Version 07.20.3240-pthreads for Mac OS 11 (Intel x86_64) as of Jun 10 2024 (a1fd8195b)
) on macOS 10.14.6 (18G9323)
Using a simplified query (so the output is easier to parse at a glance) shows your issue with the CSV output on the SPARQL query, but I think my second SQL query shows the issue is lies deeper —
SQL> SET CSV_RFC4180 ON;
SQL> sparql SELECT ("hallo you; % jörn" AS ?foo) WHERE {?s ?p ?o} LIMIT 1;
"foo"
"hallo you; % j%FFF6rn"
1 Rows. -- 2 msec.
SQL> SET CSV_RFC4180 OFF;
SQL> sparql SELECT ("hallo you; % jörn" AS ?foo) WHERE {?s ?p ?o} LIMIT 1;
foo
LONG VARCHAR
_______________________________________________________________________________
hallo you; % j?rn
1 Rows. -- 3 msec.
SQL> quit
Is it just me or is the
isql
CSV encoding weird?I guess that this is some kind of ASCII-%-URI-like-encoding, but it's not very parseable, especially the
" % "
being equal to"%3B %25 "
and the"ö"
being"%FFF6"
.