ireceptor-plus / issues

0 stars 0 forks source link

500 Error on async download #97

Closed bcorrie closed 2 years ago

bcorrie commented 2 years ago

@schristley we are getting a 500 error on an async download. This query works with a facets request:

https://vdj-staging.tacc.utexas.edu/airr/v1/rearrangement

But the follow on async query with the same JSON but requesting a TSV give us a 500 error:

https://vdj-staging.tacc.utexas.edu/airr/async/v1/rearrangement

The JSON payload is:

{
    "filters": {
        "op": "and",
        "content": [
            {
                "op": "contains",
                "content": {
                    "field": "junction_aa",
                    "value": "CASSYSDTGELFF"
                }
            },
            {
                "op": "in",
                "content": {
                    "field": "repertoire_id",
                    "value": [
                        "5958563025410068971-242ac118-0001-012",
                        "5986394417783116267-242ac118-0001-012",
                        "6018134221805588971-242ac118-0001-012",
                        "5985621319374868971-242ac118-0001-012",
[MANY REPERTOIRE_IDs DELETED]
                        "1458134456151043605-242ac114-0001-012",
                        "1493610886016003605-242ac114-0001-012"
                    ]
                }
            }
        ]
    },
    "format": "tsv"
}

The Gateway query for this download is:

https://gateway-staging.ireceptor.org/sequences?query_id=10708

schristley commented 2 years ago

@bcorrie thanks, can you attach the complete json query in a file to the issue, or tell me how to cut/paste it?

bcorrie commented 2 years ago

This was a generic Junction search, so it probably uses all the Repertoires in VDJServer. I think I can get this, but it will be a long list 8-)

bcorrie commented 2 years ago

junction-vdjserver.txt

schristley commented 2 years ago

@bcorrie Ok, I found the query in the log. The query is still running. It is a very large result of ~400M records which is fine but will take awhile. I'm not sure what the actual error is, are you saying on the submission it generated a 500 error? You got a 500 error later when checking the status?

This looks like the query id and the status shows it's running, no errors yet

$ curl https://vdj-staging.tacc.utexas.edu/airr/async/v1/status/8757243976749739540-242ac114-0001-012 | jq
{
  "query_id": "8757243976749739540-242ac114-0001-012",
  "endpoint": "rearrangement",
  "status": "SUBMITTED",
  "message": null,
  "created": "2021-11-10T14:48:14.672-06:00",
  "estimated_count": 405476713,
  "final_file": null,
  "download_url": null
}
schristley commented 2 years ago

Hmm, okay, thanks for the actual query. I guess that's the not right query that I found, it's a different one.

bcorrie commented 2 years ago

I think that is a different query - that is my MEGA-DOWNLOAD test (400M rearrangments - Gateway enforces a 500M limit).

@jeromejaglale can comment, but I think we got a 500 error on one that only returned a couple thousand records.

bcorrie commented 2 years ago

On the problem query we got a 500 error on:

https://vdj-staging.tacc.utexas.edu/airr/async/v1/rearrangement

So we never got to checking if the download was started or done...

bcorrie commented 2 years ago

BTW both my 10M and 40M downloads have worked so far, so looking good in general. The 400M download is the largest download use case for the iReceptor Gateway 8-)

schristley commented 2 years ago

Okay, here's the error, need to increase the limit on the async service...

{"message":"request entity too large","type":"entity.too.large","statusCode":413,"status":413,"expected":218246,"length":218246,"limit":102400}
schristley commented 2 years ago

@bcorrie Ok, easy fix :-D I bounced the service so the query should work now (worked for me from command line)

schristley commented 2 years ago

BTW both my 10M and 40M downloads have worked so far, so looking good in general. The 400M download is the largest download use case for the iReceptor Gateway 8-)

Guess you haven't tried to download the Crowe study yet, a mere 1.1B! ;-D BTW, if you get users wanting to download that whole study, please pass them over to me. We have the study pre-packaged into "bite-sized" downloads, so they won't have to wait for it to be extracted from the database. E.g. I did that for Chaim when he wanted the whole study.

bcorrie commented 2 years ago

I was doing Burton at 400M but it seems to have broken... Actually the Gateway is saying it is finished but the download size is only 18K. So that is wrong. Something may have gone wrong when you bounced the service, but if so the Gateway did not detect it as an error. @schristley can you check to see if the query is still running?

schristley commented 2 years ago

@bcorrie yes, it is still running, I can see the file growing on disk

-rw-r--r-- 1 vdj G-803419   29G Nov 10 16:03 lrq-618c314afd910b0df7082d81.json

It's status hasn't changed either:

$ curl https://vdj-staging.tacc.utexas.edu/airr/async/v1/status/8757243976749739540-242ac114-0001-012 | jq
{
  "query_id": "8757243976749739540-242ac114-0001-012",
  "endpoint": "rearrangement",
  "status": "SUBMITTED",
  "message": null,
  "created": "2021-11-10T14:48:14.672-06:00",
  "estimated_count": 405476713,
  "final_file": null,
  "download_url": null
}

You are polling instead of waiting for a notification, if I understand correctly, so you shouldn't fail if one of the poll requests gets an error. That could be a temporary network outage, or a service bump (in this case).

schristley commented 2 years ago

I see your query went through so closing this.