Closed alexandergantikow closed 6 years ago
The server is complaining that part of the string isn't valid UTF-8; the sparql-client gem doesn't do anything to modify these, so it may be that the source you're getting them from isn't valid UTF-8.
From your code it seems you're trying to insert the string "a german umlaut: ä "
, but the error lists a different string: "!§$%&/()=?
*áé"`. What's here definitely seems to be valid UTF-8.
The only other possibility I could see is that the server is not treating the insert body as UTF-8, which could be a server configuration (albeit an odd one). This could be overridden by adding charset=utf-8
into the Content-Type header, which would require a patch.
It would be worth seeing why the error reported is inconsistent with the data you're posting; perhaps it's from a different post.
Dear Mr. Kellogg,
thank you for your reply.
You are right: In my ruby code I'm using "a german umlaut: ä"
while my error displays the string "!§$%&/()=?
*áé"`. This comes from testing what characters are accepted and is a mistake in my issue posting. Sorry! Nonetheless the error type stays the same for both strings.
You are talking about, that the source I'm getting my content from, maybe isn't valid UTF-8. That's what I thought too, so I experimented with "hand-typed" strings like "a german umlaut: ä". Asking for their
.enconding`, ruby returned UTF-8 for their internal representation.
I thought about the POST header too and tried to modify it with the sparql client. For the insert_data method I couldn't find a way, so I tried to send options as parameter while initializing the client as described here. But it didn't work and doesn't seem to be the right way. Talking about a "patch" would mean to modify the gem itself? You propose the server configuration as a possible error source. So this would mean that my fuseki server isn't configured correctly?
I think, I found the origin of my UTF-8 encoding problem.
I followed the processing of the client to its "InsertData" class in update.rb. Here the "to_s" method is used to add the statements of a graph to a string. The RDF::NTriples::Writer.buffer
forces the string into another encoding - in my case it's "CP850". So it seems that the problem comes from "writer.rb" of the RDF gem (?).
As a quick and dirty solution I added query_text.encode!('UTF-8')
in the clients update.rb.
class InsertData < Operation
...
def to_s
query_text = 'INSERT DATA {'
query_text += ' GRAPH ' + SPARQL::Client.serialize_uri(self.options[:graph]) + ' {' if self.options[:graph]
query_text += "\n"
#puts query_text.encoding # ==> UTF-8
query_text += RDF::NTriples::Writer.buffer { |writer| @data.each { |d| writer << d } }
#puts query_text.encoding # ==> CP850
query_text += '}' if self.options[:graph]
query_text += "}\n"
# Temporary fix of encoding problem
query_text.encode!('UTF-8')
end
The writer is probably taking the encoding from the input data, as it's not set explicitly. The logic behind setting the encoding the the writer goes back a ways, but pretty much everyone expects the default encoding to be UTF-8. This could be specified using an encoding: :utf-8
parameter to Writer.buffer
, but I think the time's come to simply add this as a default to RDF::Writer#buffer
, and elsewhere.
@alexandergantikow The RDF.rb repo was updated to change the way in which the default encoding was found. It could be that your environment affected the way that it was set, but I couldn't reproduce it. Now, it uses Writer#encoding
as the default, which was otherwise nil
. Please give it a try to see if it solves this issue, and I'll release an update to the RDF.rb gem.
@gkellogg I went to the repo and updated my installed gem with your files commited in August. Was this the way it was intended by you? Unfortunately it didn't solve my issue.
I followed your tip of 7. August too. I went to the writer documentation where the "Serializing RDF statements into an NTriples string with escaped UTF-8" is described. Since I am using this example, my issue is gone.
This is only a temporary solution too. But it works better than my "#Temporary fix of encoding problem" proposed above. The query_text.encode!('UTF-8')
sometimes prduced an unpredictable error too.
If you’d like me to look into it further, please give me a script and Gemfile.lock which reproduces the problem.
Dear developers,
I'm trying to upload some triples with SPARQL insert. For example i tried to insert a triple describing a 'title'. If it contains special characters or a german umlaut the following error is returned by the sparql client:
Futhermore my fuseki (3.8.0) server returns this error:
... ...
Here is the ruby code I'm using:
As soon as I'm using a title without special characters everything works fine with the sparql client. Furthermore, if I'm using the fuseki web-gui, the umlaut title is accepted. So it seems that the character encoding is making some trouble. Because I'm not an expert when it comes to programming, I can't say if this error comes from the sparql-client, fuseki or the jetty server. My google research didn't bring me further too. So feel free to comment, if this error does not come from the client.
I'm using the following software:
Thank you Alexander