stac-utils / stac-server

A Node-based STAC API, AWS Serverless, OpenSearch
MIT License
72 stars 29 forks source link

Search limit greater than arbitrary value returns status code 502 #116

Open klsmith-usgs opened 2 years ago

klsmith-usgs commented 2 years ago

Exceeding the limit for a query on a collection provides an unhelpful server message, and leaves the user guessing what is wrong. This appears to be related to the overall response size, as different limits can be used with different STAC collections.

Examples using pystac-client and https://earth-search.aws.element84.com/v0

Succeeds:

search = sentinel2.search(collections=['sentinel-s2-l2a-cogs'],
                          bbox=(-120.23822859135915, 35.63894025515473, -118.19087145985415, 37.262086717429455),
                          datetime='2013-01-01/2020-12-31',
                          limit=500)

records = search.get_all_items_as_dict()

Fails:

search = sentinel2.search(collections=['sentinel-s2-l2a-cogs'],
                          bbox=(-120.23822859135915, 35.63894025515473, -118.19087145985415, 37.262086717429455),
                          datetime='2013-01-01/2020-12-31',
                          limit=650)

records = search.get_all_items_as_dict()
APIError: {"message": "Internal server error"}

Different collection

Succeeds:

search = sentinel2.search(collections=['sentinel-s2-l2a'],
                          bbox=(-120.23822859135915, 35.63894025515473, -118.19087145985415, 37.262086717429455),
                          datetime='2013-01-01/2020-12-31',
                          limit=750)

records = search.get_all_items_as_dict()

Fails:

search = sentinel2.search(collections=['sentinel-s2-l2a'],
                          bbox=(-120.23822859135915, 35.63894025515473, -118.19087145985415, 37.262086717429455),
                          datetime='2013-01-01/2020-12-31',
                          limit=800)

records = search.get_all_items_as_dict()
APIError: {"message": "Internal server error"}
philvarner commented 2 years ago

Status update on this ticket:

philvarner commented 2 years ago

More updates:

Running with a page size of 325, this is the size of each page:

page status size (b) sum of last 2 pages page for limit 650
1: 200 3176323    
2: 200 2397581 5573904 1
3: 200 2412505    
4: 200 2417225 4829730 2
5: 200 2816015    
6: 200 3297559 6113574 3
7: 200 3203616    
8: 200 3201786 6405402 4
9: 200 3219906    
10: 200 2102250 5322156 5

Apparently, there is a hard limit in AWS of only returning 6MB from a Lambda, as used here behind API Gateway.

philvarner commented 2 years ago

I believe the right approach here is that if the response body is going to be > 6MB, we return a 400 with

{
   "code": "0001"
    "description": "The response body that resulted from this query was too large to be returned by API Gateway. Try a smaller limit."
}
marchuffnagle commented 2 years ago

193 should hopefully fix this

philvarner commented 2 years ago

I think it's going to make it better, but I think it will still fail with a limit of 10000 (and a query that has at least that many results) -- 10k is the upper limit from the OGC API - Features Part 1 spec.

philvarner commented 2 years ago

Bumping this out to 0.5.0. Supporting gzipping the responses will help increase the limit, but the 6MB response limit is in the Lambda, so there's no good or easy way to get around that. We should probably document that this can happen and that the workaround is to decrease the limit.