Can't download long time series

aptiko commented 1 year ago

Attempting to download time series of the NTUA station often (almost always) fails because it takes too long.

aptiko commented 11 months ago

Running explain (format json) select 1 from enhydris_timeseriesrecord where timeseries_id=232 and looking at "Plan Rows" provides an approximate count. If that count is small, we can return the result immediately. Otherwise, we should process it with celery. Celery should write the result to the cache (we need to ensure there's no race condition such as the result being deleted from the cache before we've had a chance to read it). When the result is ready, it can be served. Meanwhile the user should be viewing "the data is being prepared", and when it's ready it should start downloading. This will require websockets or polling or something.

aptiko commented 11 months ago

Actually running the select statement of enhydris.models.Timeseries._retrieve_and_cache_data() seems to only take 3 seconds for a time series with 1 million rows. It might therefore not be a problem of improving caching. More investigation is needed to see why we have this delay.

aptiko commented 11 months ago

The delay is in pd.read_csv() in HTimeseries called from here: https://github.com/openmeteo/enhydris/blob/36dad1a875a43ca0993830527bdf2e76e90b7866/enhydris/models/timeseries.py#L238C34-L238C34

openmeteo / enhydris

Can't download long time series #500