Open GoogleCodeExporter opened 9 years ago
Brief exploration:
1. The attached file is indeed utf-8 encoded and correctly marked as such in
the header
2. On the command line, parsing and re-serializing it with "any23 -f rdfxml"
produces a correctly utf-8 encoded file, no encoding problems
3. I uploaded a copy of the file here:
http://richard.cyganiak.de/2011/test/Soldering_iron_test.rdf
4. Parsing and re-serializing this uploaded file with any23.org produces a
correctly utf-8 encoded response, no encoding problems:
http://any23.org/any23/?format=rdfxml&uri=http%3A%2F%2Frichard.cyganiak.de%2F201
1%2Ftest%2FSoldering_iron_test.rdf
5. Copy-pasting the file's contents into the textarea on any23.org produces a
broken double utf-8 encoded response, as indicated by the reporter
So the problem seems to be related to the processing of a submitted textarea.
Hypothesis, without having looked at the any23 servlet's code: the textarea's
content is correctly submitted and sent over the wire as utf-8, but the servlet
messes up the encoding before sending it to the any23 parser.
This seems relevant:
http://wiki.apache.org/tomcat/FAQ/CharacterEncoding
It states that by default, POST bodies are assumed to be ISO-8859-1. It can be
overridden by setting Content-Type on the HTTP request, but most browsers don't
do that when submitting form posts, so it doesn't appear to be an option. The
solution proposed there is to include a filter before the servlet that fixes
the encoding. Apparently, ready-made code for doing that could be lifted from
Tomcat.
Original comment by richard....@gmail.com
on 11 Mar 2011 at 7:29
Original issue reported on code.google.com by
danielcz...@gmail.com
on 11 Mar 2011 at 5:21Attachments: