ropensci-archive / solrium

:warning: ARCHIVED :warning: A general purpose R interface to Solr
Other
58 stars 13 forks source link

Rare behaviour with param qt for Solr_group #20

Closed jonasanso closed 10 years ago

jonasanso commented 10 years ago

I can not reproduce it in PLOS.

Normal query with 3 attributes in field list. I get a warning

> solr_group(q='*:*', group.field='accoid', group.limit=1, group.sort='price asc', sort='price asc', fl="touroperator, rating, price", fq="transport:VL", url = url)
                             groupValue numFound start rating touroperator
1  137a9c30-ee49-11df-a13b-0050569335f3    12545     0      3           JI
2  103fe3f0-8f5c-11df-a2df-001c42000009     6702     0      3           JI
3  79760f30-5b14-11e2-bb05-000c297659d3    19983     0      1           CH
4  50e1b6f0-5fe7-11e2-bb05-000c297659d3    39773     0      2           JI
5  fda70780-9b3c-11e0-9153-005056930057     1659     0      4           JI
6  8fdcaba0-bc9c-11e2-a109-000c297659d3    69484     0      2           JI
7  10d5bb50-8f5c-11df-a2df-001c42000009     5235     0      4           JI
8  0e3769c0-8f5c-11df-a2df-001c42000009    51906     0      4           JI
9  108fd8b0-8f5c-11df-a2df-001c42000009     2584     0      3           JI
10 0c880c10-8f5c-11df-a2df-001c42000009    57270     0      2           JI
   price
1  12700
2  13500
3  13700
4  14017
5  14700
6  14833
7  15166
8  15208
9  15225
10 15233
Warning message:
In if (names(input) == "response") { :
  the condition has length > 1 and only the first element will be used

Same query with raw true

> solr_group(q='*:*', group.field='accoid', group.limit=1, group.sort='price asc', sort='price asc', fl="touroperator, rating, price", fq="transport:VL", url = url, raw=TRUE)
[1] "{\"responseHeader\":{\"status\":0,\"QTime\":1472},\"grouped\":{\"accoid\":{\"matches\":34553291,\"groups\":[{\"groupValue\":\"137a9c30-ee49-11df-a13b-0050569335f3\",\"doclist\":{\"numFound\":12545,\"start\":0,\"docs\":[{\"rating\":3,\"touroperator\":\"JI\",\"price\":12700}]}},{\"groupValue\":\"103fe3f0-8f5c-11df-a2df-001c42000009\",\"doclist\":{\"numFound\":6702,\"start\":0,\"docs\":[{\"rating\":3,\"touroperator\":\"JI\",\"price\":13500}]}},{\"groupValue\":\"79760f30-5b14-11e2-bb05-000c297659d3\",\"doclist\":{\"numFound\":19983,\"start\":0,\"docs\":[{\"rating\":1,\"touroperator\":\"CH\",\"price\":13700}]}},{\"groupValue\":\"50e1b6f0-5fe7-11e2-bb05-000c297659d3\",\"doclist\":{\"numFound\":39773,\"start\":0,\"docs\":[{\"rating\":2,\"touroperator\":\"JI\",\"price\":14017}]}},{\"groupValue\":\"fda70780-9b3c-11e0-9153-005056930057\",\"doclist\":{\"numFound\":1659,\"start\":0,\"docs\":[{\"rating\":4,\"touroperator\":\"JI\",\"price\":14700}]}},{\"groupValue\":\"8fdcaba0-bc9c-11e2-a109-000c297659d3\",\"doclist\":{\"numFound\":69484,\"start\":0,\"docs\":[{\"rating\":2,\"touroperator\":\"JI\",\"price\":14833}]}},{\"groupValue\":\"10d5bb50-8f5c-11df-a2df-001c42000009\",\"doclist\":{\"numFound\":5235,\"start\":0,\"docs\":[{\"rating\":4,\"touroperator\":\"JI\",\"price\":15166}]}},{\"groupValue\":\"0e3769c0-8f5c-11df-a2df-001c42000009\",\"doclist\":{\"numFound\":51906,\"start\":0,\"docs\":[{\"rating\":4,\"touroperator\":\"JI\",\"price\":15208}]}},{\"groupValue\":\"108fd8b0-8f5c-11df-a2df-001c42000009\",\"doclist\":{\"numFound\":2584,\"start\":0,\"docs\":[{\"rating\":3,\"touroperator\":\"JI\",\"price\":15225}]}},{\"groupValue\":\"0c880c10-8f5c-11df-a2df-001c42000009\",\"doclist\":{\"numFound\":57270,\"start\":0,\"docs\":[{\"rating\":2,\"touroperator\":\"JI\",\"price\":15233}]}}]}}}\n"
attr(,"class")
[1] "sr_group"
attr(,"wt")
[1] "json"

When I add qt='distributedSearch' in the response the last 2 attributes are missing

> solr_group(q='*:*', group.field='accoid', group.limit=1, group.sort='price asc', sort='price asc', fl="touroperator, rating, price", fq="transport:VL", url = url, qt='distributedSearch', raw=FALSE)
                             groupValue numFound start touroperator
1  accaa2a0-fb51-11e2-a109-000c297659d3    17750     0           CH
2  77f8e0f0-9c42-11e2-a109-000c297659d3     4084     0           JI
3  53432a60-c7df-11e0-aa1b-005056930057     6636     0           JI
4  edefdd00-8f5b-11df-a2df-001c42000009    23974     0           JI
5  137a9c30-ee49-11df-a13b-0050569335f3    12545     0           JI
6  10438d70-8f5c-11df-a2df-001c42000009    13220     0           CH
7  110c34a0-8f5c-11df-a2df-001c42000009    13384     0           CH
8  10427c00-8f5c-11df-a2df-001c42000009     8898     0           JI
9  c69d6fb0-9c41-11e2-a109-000c297659d3     4104     0           JI
10 6f885e80-9336-11e0-9153-005056930057    13065     0           CH
Warning message:
In if (names(input) == "response") { :
  the condition has length > 1 and only the first element will be used

In the raw response they are also missing

> solr_group(q='*:*', group.field='accoid', group.limit=1, group.sort='price asc', sort='price asc', fl="touroperator, rating, price", fq="transport:VL", url = url, qt='distributedSearch', raw=TRUE) 
[1] "{\"responseHeader\":{\"status\":0,\"QTime\":1774},\"grouped\":{\"accoid\":{\"matches\":141800873,\"groups\":[{\"groupValue\":\"accaa2a0-fb51-11e2-a109-000c297659d3\",\"doclist\":{\"numFound\":17750,\"start\":0,\"docs\":[{\"touroperator\":\"CH\"}]}},{\"groupValue\":\"77f8e0f0-9c42-11e2-a109-000c297659d3\",\"doclist\":{\"numFound\":4084,\"start\":0,\"docs\":[{\"touroperator\":\"JI\"}]}},{\"groupValue\":\"53432a60-c7df-11e0-aa1b-005056930057\",\"doclist\":{\"numFound\":6636,\"start\":0,\"docs\":[{\"touroperator\":\"JI\"}]}},{\"groupValue\":\"edefdd00-8f5b-11df-a2df-001c42000009\",\"doclist\":{\"numFound\":23974,\"start\":0,\"docs\":[{\"touroperator\":\"JI\"}]}},{\"groupValue\":\"137a9c30-ee49-11df-a13b-0050569335f3\",\"doclist\":{\"numFound\":12545,\"start\":0,\"docs\":[{\"touroperator\":\"JI\"}]}},{\"groupValue\":\"10438d70-8f5c-11df-a2df-001c42000009\",\"doclist\":{\"numFound\":13220,\"start\":0,\"docs\":[{\"touroperator\":\"CH\"}]}},{\"groupValue\":\"110c34a0-8f5c-11df-a2df-001c42000009\",\"doclist\":{\"numFound\":13384,\"start\":0,\"docs\":[{\"touroperator\":\"CH\"}]}},{\"groupValue\":\"c69d6fb0-9c41-11e2-a109-000c297659d3\",\"doclist\":{\"numFound\":4104,\"start\":0,\"docs\":[{\"touroperator\":\"JI\"}]}},{\"groupValue\":\"10427c00-8f5c-11df-a2df-001c42000009\",\"doclist\":{\"numFound\":8898,\"start\":0,\"docs\":[{\"touroperator\":\"JI\"}]}},{\"groupValue\":\"6f885e80-9336-11e0-9153-005056930057\",\"doclist\":{\"numFound\":13065,\"start\":0,\"docs\":[{\"touroperator\":\"CH\"}]}}]}}}\n"
attr(,"class")
[1] "sr_group"
attr(,"wt")
[1] "json"

I don't know how to get url sent to Solr to check if the url was built correctly.

sckott commented 10 years ago

@jonasanso I will look into this issue. We could print out the API call url when each function call is made - I have done that in other packages. Good idea?

sckott commented 10 years ago

What is the URL you were trying with? Maybe I can see what might be going on, and debug that error you're getting.

I haven't played with the qt parameter much, so I don't know much about what I should expect to see.

I am about to push a change so that the URL prints out for the call so you can see what url is generated.

Also, this seems to work for me, at least doesn't take off the fields requested when qt is used

solr_group(q='*:*', group.field='journal', group.limit=1, group.sort='alm_twitterCount desc', sort='score asc', fl="id, score, alm_twitterCount, article_type, accepted_date", fq='alm_twitterCount:[5 TO 10]', qt='distributedSearch', url = url, key=key)
                        groupValue numFound start                                      id alm_twitterCount        accepted_date     article_type score
1                             none     4172     0            10.1371/journal.pone.0043099               10 2012-07-19T00:00:00Z Research Article     1
2                         plos one    19009     0            10.1371/journal.pone.0082853               10 2013-10-29T00:00:00Z Research Article     1
3                    plos genetics     1276     0 10.1371/journal.pgen.1003911/references               10 2013-09-08T00:00:00Z Research Article     1
4                     plos biology     1139     0            10.1371/journal.pbio.1001765               10 2013-12-03T00:00:00Z Research Article     1
5                   plos pathogens     1155     0            10.1371/journal.ppat.1003741               10                 <NA>           Pearls     1
6 plos neglected tropical diseases      929     0            10.1371/journal.pntd.0001132               10                 <NA>        Editorial     1
7       plos computational biology      991     0            10.1371/journal.pcbi.1002860               10 2012-11-09T00:00:00Z Research Article     1
8                    plos medicine      776     0            10.1371/journal.pmed.1001261               10                 <NA> Health in Action     1
sckott commented 10 years ago

Also note that the URL is decoded so it is easier to read, but should still work when you paste into the browser, or do a curl call

jonasanso commented 10 years ago

Thanks Scott

I will check it tomorrow.

Jonás El 24/01/2014 20:58, "Scott Chamberlain" notifications@github.com escribió:

Also note that the URL is decoded so it is easier to read, but should still work when you paste into the browser, or do a curl call

— Reply to this email directly or view it on GitHubhttps://github.com/ropensci/solr/issues/20#issuecomment-33255328 .

jonasanso commented 10 years ago

Found the rare behaviour.

http://solr-prod:7070/solr/select?group.field=accoid&q=_:_&start=0&sort=price asc&fq=transport:VL&fl=touroperator&fl= rating&fl= price&wt=json&group.limit=1&group.sort=price asc&group.sort=price asc&group=true&qt=distributedSearch

The Solr url generated contains many fl and the request handler we are using does not support that.

I could change the handler to support it, but the the definition for the attribute fl does not indicate it should be a multiple attribute. The parameter should contain the field names separated by commas or spaces http://wiki.apache.org/solr/CommonQueryParameters#fl

The request works http://solr-prod:7070/solr/select? ... &fl=touroperator, rating, price ... &qt=distributedSearch

If there is no special reason to split the fl param into multiple url parameters I would propose to send only one parameter fl with the user specified fields.

I have found very useful printing the solr url to debug this issue, maybe we can leave it until we have tested more or we can add a new param SHOW_URL=false to the functions so the url is shown only when SHOW_URL is true.

sckott commented 10 years ago

Let me know if that fixes your problem with a new install

jonasanso commented 10 years ago

Thanks. It is working.