isamplesorg / isamples_inabox

Provides functionality intermediate to a collection and central
0 stars 1 forks source link

allow /thing/select to handle POST requests #338

Open rdhyee opened 6 months ago

rdhyee commented 6 months ago

As I posted in Slack Thread (2023.12.12) to @dannymandel

I’m finding that /thing/select doesn’t handle POST requests, right? pysolr switches to POST when the size of a URL GET line gets over 1024 - https://github.com/django-haystack/pysolr/blob/master/pysolr.py#L479-L491

Danny replied:

would be an easy fix to implement
(https://isamples.slack.com/archives/C01ERDL5SF8/p1702421636090329)
@app.get(f"/{THING_URL_PATH}/select", response_model=typing.Any)
would conceptually just need to be
@app.get(f"/{THING_URL_PATH}/select", response_model=typing.Any)
@app.post(f"/{THING_URL_PATH}/select", response_model=typing.Any)
there is some nuance in how the parameters get packaged, but that's the basic idea
rdhyee commented 6 months ago

@dannymandel see See https://github.com/rdhyee/isamples-examples/blob/exploratory/basic/record_counts.ipynb

You’ll see how I monkeypatch pysolr with my_select but have effectively turned off POST requests by setting a high threshold — you can lower the threshold down to 1024 or less to force monkeypatched pysolr to go to POST.

image

dannymandel commented 6 months ago

This should be available on https://central.isample.xyz now. @rdhyee please verify.

rdhyee commented 6 months ago

Flying today. Will verify tomorrow or next day. Thanks, @dannymandel

rdhyee commented 6 months ago

@dannymandel Accessing /thing/solr using pysolr, in which requests are switched from GET to POST when the query gets long, still doesn't work for me. I've distilled the differences in https://github.com/rdhyee/isamples-examples/blob/f6bfc8257744d60aa46f42ede1ad6c82fb4143ae/basic/isamples_get_post.py so that you can see the difference. The way that the POST-request is encoded by pysolr isn't recognized properly by your code.

dannymandel commented 5 months ago

I do see different responses based on GET or POST.

dannymandel commented 5 months ago

So, the POST is working, but it appears to be ignoring the parameters in @rdhyee's example. Digging in as to why that is the case…

dannymandel commented 5 months ago

This is what the response looks like:

b'{\n  "responseHeader":{\n    "zkConnected":true,\n    "status":0,\n    "QTime":0,\n    "params":{\n      "q":"*:*",\n      "fl":"id",\n      "start":"0",\n      "rows":"10",\n      "wt":"json"}},\n  "response":{"numFound":6387537,"start":0,"numFoundExact":true,"docs":[\n      {\n        "id":"IGSN:IESER000J"},\n      {\n        "id":"IGSN:IESER000K"},\n      {\n        "id":"IGSN:IESER000L"},\n      {\n        "id":"IGSN:IELL10002"},\n      {\n        "id":"IGSN:IENWU0PBP"},\n      {\n        "id":"IGSN:IENWU0SDP"},\n      {\n        "id":"IGSN:IESER0009"},\n      {\n        "id":"IGSN:IESER0008"},\n      {\n        "id":"IGSN:IESER0006"},\n      {\n        "id":"IGSN:IESER000B"}]\n  }}\n'
dannymandel commented 5 months ago

Ah, I see, we are reading directly off the query parameters:

for k, v in request.query_params.multi_items():

probably just need to change that to look at the request body if it's a POST.

dannymandel commented 5 months ago

This has been pushed to https://central.isample.xyz. Example usage:

import httpx
import json
from urllib.parse import urlencode

params = {'q': '*:*',
 'fl': ('searchText',
  'authorizedBy',
  'producedBy_resultTimeRange',
  'hasContextCategory',
  'curation_accessContraints',
  'curation_description_text',
  'curation_label',
  'curation_location',
  'curation_responsibility',
  'description_text',
  'id',
  'informalClassification',
  'keywords',
  'label',
  'hasMaterialCategory',
  'producedBy_description_text',
  'producedBy_hasFeatureOfInterest',
  'producedBy_label',
  'producedBy_responsibility',
  'producedBy_resultTime',
  'producedBy_samplingSite_description_text',
  'producedBy_samplingSite_label',
  'producedBy_samplingSite_location_elevationInMeters',
  'producedBy_samplingSite_location_latitude',
  'producedBy_samplingSite_location_longitude',
  'producedBy_samplingSite_placeName',
  'registrant',
  'samplingPurpose',
  'source',
  'sourceUpdatedTime',
  'producedBy_samplingSite_location_rpt',
  'hasSpecimenCategory'),
 'start': 0,
 'rows': 100,
 'fq': ['producedBy_resultTimeRange:[1800 TO NOW]',
  'source:"OPENCONTEXT"',
  '-relation_target:*'],
 'facet': 'on',
 'facet.field': ('authorizedBy',
  'hasContextCategory',
  'hasMaterialCategory',
  'registrant',
  'source',
  'hasSpecimenCategory'),
 'cursorMark': '*',
 'sort': 'id ASC',
 'facet.range': 'producedBy_resultTimeRange',
 'f.producedBy_resultTimeRange.facet.range.gap': '+1YEARS',
 'f.producedBy_resultTimeRange.facet.range.start': '1800-01-01T00:00:00Z',
 'f.producedBy_resultTimeRange.facet.range.end': '2023-01-01T00:00:00Z'}

ISB_SERVER = "https://central.isample.xyz/isamples_central/"

# get
r = httpx.request('GET', f'{ISB_SERVER}thing/select', params=params)
print('GET')
print(r.json()['responseHeader']['params'])

# post
headers = {
    "Content-Type": "application/json; charset=utf-8"
}

params_encoded = json.dumps(params)
r1 = httpx.post(f'{ISB_SERVER}thing/select', data=params_encoded, headers=headers)
print('POST')
print(r1.json()['responseHeader']['params'])