isamplesorg / isamples_inabox

Provides functionality intermediate to a collection and central
0 stars 1 forks source link

Allow for requests to URL with trailing '/' to accommodate pysolr #334

Closed rdhyee closed 8 months ago

rdhyee commented 10 months ago

pysolr insists on attaching a trailing '/' to the specified server URL, which makes it a little pain to use with the iSamples API at the moment (see latest beta version.)

https://central.isample.xyz/isamples_central/thing/select?q=%2A%3A%2A&wt=json

works ok

but the same query with a trailing slash in the base url

https://central.isample.xyz/isamples_central/thing/select/?q=%2A%3A%2A&wt=json

results in a 504 Gateway time-out error.

I can currently workaround this problem by monkeypatching pysolr -- but as @dannymandel noted, we don't want API users to have to implement the following workaround:

import pysolr

def my_select(self, params, handler=None):
    """
    :param params:
    :param handler: defaults to self.search_handler (fallback to 'select')
    :return:
    """
    # Returns json docs unless otherwise specified
    params.setdefault("wt", "json")
    custom_handler = handler or self.search_handler
    handler = "select"
    if custom_handler:
        if self.use_qt_param:
            params["qt"] = custom_handler
        else:
            handler = custom_handler

    params_encoded = pysolr.safe_urlencode(params, True)

    if len(params_encoded) < 1024:
        # Typical case.
        path = "%s?%s" % (handler, params_encoded)
        return self._send_request("get", path)
    else:
        # Handles very long queries by submitting as a POST.
        path = "%s" % handler
        headers = {
            "Content-type": "application/x-www-form-urlencoded; charset=utf-8"
        }
        return self._send_request(
            "post", path, body=params_encoded, headers=headers
        )

pysolr.Solr._select = my_select
dannymandel commented 10 months ago

This is great, @rdhyee! Thanks for writing I up. I think it should be an easy fix. I’ll take a look tomorrow!

dannymandel commented 9 months ago

Solved like this for a similar metrics use case:

def _root(session: Session):
    db_start_time = time.time()
    metrics = PrometheusMetrics()
    metrics.db_counts = things_by_authority_count_dict(session)
    db_end_time = time.time()
    metrics.db_scrape_duration_seconds = db_end_time - db_start_time
    metrics.solr_counts = solr_counts_by_authority()
    metrics.solr_scrape_duration_seconds = time.time() - db_end_time
    return PlainTextResponse(metrics.metrics_string())

# Note that prometheus seemed unhappy with /metrics/ vs. /metrics.  Include both since they should both work.
@router.get("")
def root(session: Session = Depends(get_session)):
    return _root(session)

# Note that prometheus seemed unhappy with /metrics/ vs. /metrics.  Include both since they should both work.
@router.get("/")
def root_with_slash(session: Session = Depends(get_session)):
    return _root(session)

I think something like this should work here as well.

dannymandel commented 9 months ago

This should be available on https://central.isample.xyz now. @rdhyee please verify.

dannymandel commented 8 months ago

Going to close this one as this curl is working:

mandeld@Daniels-MacBook-Pro isamples_inabox % curl "https://central.isample.xyz/isamples_central/thing/select/?q=%2A%3A%2A&wt=json"
{
  "responseHeader":{
    "zkConnected":true,
    "status":0,
    "QTime":0,
    "params":{
      "q":"*:*",
      "fl":"id",
      "start":"0",
      "rows":"10",
      "wt":"json"}},
  "response":{"numFound":6387537,"start":0,"numFoundExact":true,"docs":[
      {
        "id":"IGSN:IESER000J"},
      {
        "id":"IGSN:IESER000K"},
      {
        "id":"IGSN:IESER000L"},
      {
        "id":"IGSN:IELL10002"},
      {
        "id":"IGSN:IENWU0PBP"},
      {
        "id":"IGSN:IENWU0SDP"},
      {
        "id":"IGSN:IESER0009"},
      {
        "id":"IGSN:IESER0008"},
      {
        "id":"IGSN:IESER0006"},
      {
        "id":"IGSN:IESER000B"}]
  }}