Closed GoogleCodeExporter closed 9 years ago
Can you expand on what you mean by "bypassing the build-in php function
http_build_query"?
I don't think http_build_query, that i'm aware, actually cares about the
encoding of the data you try to pass through it - it just percent encodes
everything falling outside an alphanumeric byte range (and some special bytes).
So if the query given to the search function is UTF-8 it should be fine
Are you sure the entered data came to the PHP script as UTF-8? You can check it
with functions like mb_check_encoding
Additionally, if you're sure things are going through, you can try using the
POST method on search (its a parameter). This sends along charset=utf-8 in the
content-type request header. Have to be sure your data is actually utf-8 though.
Original comment by donovan....@gmail.com
on 27 May 2012 at 8:34
By 'bypassing' i mean that i used code to build the request query.
To answer your question. The string passed to the search method is utf-8
encoded
(var_dump(mb_check_encoding($query,'UTF-8')) == true.
The URL which is send to tomcat is:
wt=json&json.nl=map&q=%CE%B2%CE%B1%CF%81%CE%B9%CE%AD%CE%BC%CE%B1%CE%B9&start=0&r
ows=10000
the word δοκιμή (test) is the url-encoded string
%CE%B2%CE%B1%CF%81%CE%B9%CE%AD%CE%BC%CE%B1%CE%B9.
With this query solr replies that NO documents where found.
BUT with the following wt=json&json.nl=map&q=δοκιμή&start=0&rows=10000
results are returned as they where supposed to.
Hope i gave a more clear description of the issue.
Keep up the good work.
%CE%B2%CE%B1%CF%81%CE%B9%CE%AD%CE%BC%CE%B1%CE%B9
Original comment by andreas....@gmail.com
on 28 May 2012 at 10:35
First, wanted to know if it was intentional that the urlencoded example decodes
to βαριέμαι and not δοκιμή (as your email says it should)?
I just ran this to quickly see that:
php -r "echo urldecode('%CE%B2%CE%B1%CF%81%CE%B9%CE%AD%CE%BC%CE%B1%CE%B9');"
If it was just a copy and paste mistake from another test, that's fine, just
wanted to make sure we aren't trying to compare two different things.
Lastly, I finally got around to fully checking this out in an encoding
perspective and verified what I expected:
* If I use utf-8 search queries, but the servlet container for Solr is using the default URI encoding (latin-1) then I need to submit my query using the POST method so its interpreted correctly. NOTE: if you use the POST method it is expected you ALWAYS are using utf-8 data - there is currently no way to specify another encoding for the Content-Type header that's sent.
* If I use utf-8 search queries, and my servlet container for Solr is using UTF-8 as the URI encoding (in tomcat this can be set in server.xml at the Connector element) then everything works fine in both GET and POST http methods.
Hope you found the answer to your issues in the meantime.
Original comment by donovan....@gmail.com
on 1 Jun 2012 at 6:23
currently works as expected.
Original comment by donovan....@gmail.com
on 28 Aug 2012 at 2:25
Original issue reported on code.google.com by
andreas....@gmail.com
on 27 May 2012 at 3:25