leschweitzer / solrpy

Automatically exported from code.google.com/p/solrpy
0 stars 0 forks source link

replace xml response processing with json #22

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
I think it would be nice to replace the somewhat complicated SAX driven 
ResponseContentHandler to use the JSON response format from Solr. There 
would be less moving parts at least, and it could even be a bit faster.

Original issue reported on code.google.com by ed.summers on 28 Apr 2010 at 9:18

GoogleCodeExporter commented 9 years ago

Original comment by ed.summers on 28 Apr 2010 at 9:18

GoogleCodeExporter commented 9 years ago
The JSON response from SOLR does not contain data type information. The only 
place that I am aware that this is 
used is on the date fields. The response from solr for dates is a <date></date> 
instead of a <string></string>.

Original comment by benliles on 28 Apr 2010 at 1:58

GoogleCodeExporter commented 9 years ago
For large query results, either using wt=json or wt=python sped up the parsing 
significantly (5-10x parsing 
speed). As benliles suggests however it is non-trivial to parse the result to 
native types unless you know your 
schema ahead of time. We moved to field types in the field name (i.e. 
d_indexed, i_rating, f_score etc) to deal 
with this and use json/simplejson as the parser. 

Original comment by br...@echonest.com on 30 Apr 2010 at 1:50

GoogleCodeExporter commented 9 years ago
Would we get native types with wt=python?

Original comment by ed.summers on 30 Apr 2010 at 3:25

GoogleCodeExporter commented 9 years ago
I have a patch for this in http://code.google.com/r/joelnothman-solrpy-json, 
wherein one can specify rules to translate certain fields of JSON using 
arbitrary callbacks. In theory, one could possibly parse a given schema.xml to 
produce translators necessary to replicate the XML parsing response. These 
translation rules are also applied generically to the response and so may 
similarly be used to transform responses from handlers other than Select.

For integration, a SearchHandler is constructed with a parse_response argument. 
The parse_response must have a wt attribute, which is used to set the 
corresponding query parameter; and it must be callable, akin to 
parse_query_response (now renamed parse_xml_response)

Original comment by joel.nothman@gmail.com on 25 Oct 2012 at 1:31

GoogleCodeExporter commented 9 years ago
I have added to that clone a TermVectorHandler which queries and parses the 
response from tvrh. It is an example where the XML output far outsizes the JSON 
equivalent (10x or 11x).

Original comment by joel.nothman@gmail.com on 26 Oct 2012 at 1:25

GoogleCodeExporter commented 9 years ago
And a final note to Ed, no: wt=python returns a response almost identical to 
wt=json, and if it *had* included DateTimes, etc, there would be security 
vulnerabilities if it could not be deserialised with ast.literal_eval.

Original comment by joel.nothman@gmail.com on 28 Oct 2012 at 2:59

GoogleCodeExporter commented 9 years ago
The above should be http://code.google.com/r/joelnothman-solrpy, and I'm 
putting in a few changes to handle what in XML is <lst> with non-unique keys, 
and for 'import json' to only happen if needed.

Original comment by joel.nothman@gmail.com on 30 Oct 2012 at 4:55