maris-development / beacon-blue-cloud

2 stars 0 forks source link

CORA Times series: "Failed to process query: Failed to process query: Memory limit exceeded. Limit: 17179869184." #2

Open ctroupin opened 1 month ago

ctroupin commented 1 month ago

Error

The request doesn't work for some values of the selected time period.

>>> response.text
'"Failed to process query: Failed to process query: Memory limit exceeded. Limit: 17179869184."

How to reproduce

This works

import requests
import json
import datetime
query1 = {'query_parameters': [{'column_name': 'TEMP', 'alias': 'TEMP'},
  {'column_name': 'JULD', 'alias': 'TIME'},
  {'column_name': 'DEPH', 'alias': 'DEPTH'},
  {'column_name': 'LONGITUDE', 'alias': 'LONGITUDE'},
  {'column_name': 'LATITUDE', 'alias': 'LATITUDE'}],
 'filters': [{'for_query_parameter': 'TIME', 'min': 21915, 'max': 22280},
  {'for_query_parameter': 'DEPTH', 'min': 0.0, 'max': 10.0},
  {'for_query_parameter': 'LONGITUDE', 'min': 12.0, 'max': 18.0},
  {'for_query_parameter': 'LATITUDE', 'min': 43.0, 'max': 46.0},
  {'for_query_parameter': 'TEMP', 'min': -2.0, 'max': 30.0}],
 'output': {'format': 'netcdf'}}

response = requests.post("https://beacon-cora-ts.maris.nl/api/query", json.dumps(query1), headers = {
    'Authorization' : f'Bearer {Token}',
    'Content-type': 'application/json'
})

This one fails

(the only difference is the maximal value for the time)

query2 = {'query_parameters': [{'column_name': 'TEMP', 'alias': 'TEMP'},
  {'column_name': 'JULD', 'alias': 'TIME'},
  {'column_name': 'DEPH', 'alias': 'DEPTH'},
  {'column_name': 'LONGITUDE', 'alias': 'LONGITUDE'},
  {'column_name': 'LATITUDE', 'alias': 'LATITUDE'}],
 'filters': [{'for_query_parameter': 'TIME', 'min': 21915, 'max': 22600},
  {'for_query_parameter': 'DEPTH', 'min': 0.0, 'max': 10.0},
  {'for_query_parameter': 'LONGITUDE', 'min': 12.0, 'max': 18.0},
  {'for_query_parameter': 'LATITUDE', 'min': 43.0, 'max': 46.0},
  {'for_query_parameter': 'TEMP', 'min': -2.0, 'max': 30.0}],
 'output': {'format': 'netcdf'}}

response = requests.post("https://beacon-cora-ts.maris.nl/api/query", json.dumps(query2), headers = {
    'Authorization' : f'Bearer {Token}',
    'Content-type': 'application/json'
})
robinskil commented 1 month ago

Hi,

This is an error message that basically indicates that the current query you're trying to do is too large. This is a safety mechanism to prevent everyone querying terabytes worth of data.

We will be making the error message more clear in the next release of Beacon.

ctroupin commented 1 month ago

thanks Robin, I think the message is clear enough, but what caught my attention is that just changing the time period by a few days from min: 21915, 'max': 22280 to 'min': 21915, 'max': 22600 triggered the error.

In this case, isn't query limit too low?

sharppaul commented 1 month ago

Hey Charles,

The current way Beacon is processing the queries is quite memory intensive. Robin is working on a feature where it doesn't use that much memory, which will solve this error.

Another limit is how many datasets are matching your query, which you haven't ran into yet, but all the limits we have are configurable and we are able to adjust them if needed.

Kind regards,

Paul

ctroupin commented 1 month ago

Hi all, agreed with all what you said, no surprise concerning the limits etc, just surprising that a query working with SeaDataNet or World Ocean Database (quite large datasets) fails with CORA Time Series (which I would not expect to be so large), that was my point.

So let's close it if that's an expected behaviour! Thanks.