Open rdhyee opened 6 months ago
@dannymandel see See https://github.com/rdhyee/isamples-examples/blob/exploratory/basic/record_counts.ipynb
You’ll see how I monkeypatch pysolr with my_select but have effectively turned off POST requests by setting a high threshold — you can lower the threshold down to 1024 or less to force monkeypatched pysolr to go to POST.
This should be available on https://central.isample.xyz now. @rdhyee please verify.
Flying today. Will verify tomorrow or next day. Thanks, @dannymandel
@dannymandel Accessing /thing/solr
using pysolr, in which requests are switched from GET to POST when the query gets long, still doesn't work for me. I've distilled the differences in https://github.com/rdhyee/isamples-examples/blob/f6bfc8257744d60aa46f42ede1ad6c82fb4143ae/basic/isamples_get_post.py so that you can see the difference. The way that the POST-request is encoded by pysolr isn't recognized properly by your code.
I do see different responses based on GET
or POST
.
So, the POST
is working, but it appears to be ignoring the parameters in @rdhyee's example. Digging in as to why that is the case…
This is what the response looks like:
b'{\n "responseHeader":{\n "zkConnected":true,\n "status":0,\n "QTime":0,\n "params":{\n "q":"*:*",\n "fl":"id",\n "start":"0",\n "rows":"10",\n "wt":"json"}},\n "response":{"numFound":6387537,"start":0,"numFoundExact":true,"docs":[\n {\n "id":"IGSN:IESER000J"},\n {\n "id":"IGSN:IESER000K"},\n {\n "id":"IGSN:IESER000L"},\n {\n "id":"IGSN:IELL10002"},\n {\n "id":"IGSN:IENWU0PBP"},\n {\n "id":"IGSN:IENWU0SDP"},\n {\n "id":"IGSN:IESER0009"},\n {\n "id":"IGSN:IESER0008"},\n {\n "id":"IGSN:IESER0006"},\n {\n "id":"IGSN:IESER000B"}]\n }}\n'
Ah, I see, we are reading directly off the query parameters:
for k, v in request.query_params.multi_items():
probably just need to change that to look at the request body if it's a POST.
This has been pushed to https://central.isample.xyz. Example usage:
import httpx
import json
from urllib.parse import urlencode
params = {'q': '*:*',
'fl': ('searchText',
'authorizedBy',
'producedBy_resultTimeRange',
'hasContextCategory',
'curation_accessContraints',
'curation_description_text',
'curation_label',
'curation_location',
'curation_responsibility',
'description_text',
'id',
'informalClassification',
'keywords',
'label',
'hasMaterialCategory',
'producedBy_description_text',
'producedBy_hasFeatureOfInterest',
'producedBy_label',
'producedBy_responsibility',
'producedBy_resultTime',
'producedBy_samplingSite_description_text',
'producedBy_samplingSite_label',
'producedBy_samplingSite_location_elevationInMeters',
'producedBy_samplingSite_location_latitude',
'producedBy_samplingSite_location_longitude',
'producedBy_samplingSite_placeName',
'registrant',
'samplingPurpose',
'source',
'sourceUpdatedTime',
'producedBy_samplingSite_location_rpt',
'hasSpecimenCategory'),
'start': 0,
'rows': 100,
'fq': ['producedBy_resultTimeRange:[1800 TO NOW]',
'source:"OPENCONTEXT"',
'-relation_target:*'],
'facet': 'on',
'facet.field': ('authorizedBy',
'hasContextCategory',
'hasMaterialCategory',
'registrant',
'source',
'hasSpecimenCategory'),
'cursorMark': '*',
'sort': 'id ASC',
'facet.range': 'producedBy_resultTimeRange',
'f.producedBy_resultTimeRange.facet.range.gap': '+1YEARS',
'f.producedBy_resultTimeRange.facet.range.start': '1800-01-01T00:00:00Z',
'f.producedBy_resultTimeRange.facet.range.end': '2023-01-01T00:00:00Z'}
ISB_SERVER = "https://central.isample.xyz/isamples_central/"
# get
r = httpx.request('GET', f'{ISB_SERVER}thing/select', params=params)
print('GET')
print(r.json()['responseHeader']['params'])
# post
headers = {
"Content-Type": "application/json; charset=utf-8"
}
params_encoded = json.dumps(params)
r1 = httpx.post(f'{ISB_SERVER}thing/select', data=params_encoded, headers=headers)
print('POST')
print(r1.json()['responseHeader']['params'])
As I posted in Slack Thread (2023.12.12) to @dannymandel
Danny replied: