openmeteo / enhydris

A database with a web interface for the storage and management of hydro/meteorological measurements and time series
GNU Affero General Public License v3.0
18 stars 11 forks source link

Can't download long time series #500

Closed aptiko closed 11 months ago

aptiko commented 1 year ago

Attempting to download time series of the NTUA station often (almost always) fails because it takes too long.

aptiko commented 11 months ago

Running explain (format json) select 1 from enhydris_timeseriesrecord where timeseries_id=232 and looking at "Plan Rows" provides an approximate count. If that count is small, we can return the result immediately. Otherwise, we should process it with celery. Celery should write the result to the cache (we need to ensure there's no race condition such as the result being deleted from the cache before we've had a chance to read it). When the result is ready, it can be served. Meanwhile the user should be viewing "the data is being prepared", and when it's ready it should start downloading. This will require websockets or polling or something.

aptiko commented 11 months ago

Actually running the select statement of enhydris.models.Timeseries._retrieve_and_cache_data() seems to only take 3 seconds for a time series with 1 million rows. It might therefore not be a problem of improving caching. More investigation is needed to see why we have this delay.

aptiko commented 11 months ago

The delay is in pd.read_csv() in HTimeseries called from here: https://github.com/openmeteo/enhydris/blob/36dad1a875a43ca0993830527bdf2e76e90b7866/enhydris/models/timeseries.py#L238C34-L238C34