Closed aptiko closed 11 months ago
Running explain (format json) select 1 from enhydris_timeseriesrecord where timeseries_id=232
and looking at "Plan Rows" provides an approximate count. If that count is small, we can return the result immediately. Otherwise, we should process it with celery. Celery should write the result to the cache (we need to ensure there's no race condition such as the result being deleted from the cache before we've had a chance to read it). When the result is ready, it can be served. Meanwhile the user should be viewing "the data is being prepared", and when it's ready it should start downloading. This will require websockets or polling or something.
Actually running the select statement of enhydris.models.Timeseries._retrieve_and_cache_data()
seems to only take 3 seconds for a time series with 1 million rows. It might therefore not be a problem of improving caching. More investigation is needed to see why we have this delay.
The delay is in pd.read_csv()
in HTimeseries
called from here: https://github.com/openmeteo/enhydris/blob/36dad1a875a43ca0993830527bdf2e76e90b7866/enhydris/models/timeseries.py#L238C34-L238C34
Attempting to download time series of the NTUA station often (almost always) fails because it takes too long.