pacificclimate / pdp

The PCIC Data Portal - Server software to run the entire web application
GNU General Public License v3.0
1 stars 2 forks source link

Station data portal download times out #294

Closed jameshiebert closed 1 year ago

jameshiebert commented 1 year ago

On behalf of Aidan LeBlanc aidan.w.leblanc@gmail.com:

What Happened?

Attempted to download weather station data within a specific time series as both CSV and NetCDF files.

Criteria was set as follows: Polygon of lower mainland; all precipitation; all networks; hourly; DO NOT include stations with no observations; time series Nov 25, 2021 – Dec 5 2021.

Downloaded MetaDATA as CSV without issue.

Tried to download station data as NetCDF and as CSV/ASCII; clipped to time series; DO NOT include climatologies. Download was at speed of approximately 300 bytes/second. Download failure “Failed – Network Error” usually between 1 and 30 kb.

Tried downloading on different computers, wifi networks, and on high bandwidth UVIC Lab computers. No successes.

With same parameters as above, download succeeded when using OLD Data Portal (prior to its removal from the PCIC site).

 

NetCDF data downloaded from Old Data Portal does not contain coordinates. Opened in Python 3.10 via package  xarray. Files have Variables and Attributes, but no readable coordinates or dimensions. When attempting to extract numerical values from a variable, error occurred claiming no values for selected variable. Tried with multiple files and multiple variables in each.

Have tried files from four different Networks, all contain same issue.

What I expected to happen

Files would download normally.

 

NetCDF files would have dimensions and coordinates for the data in each file.

 

 

If more than one index was used, IndexError: too many indicies

 

Time: Several occasions over June 2023. PST.

24.68.174.19

Downloads attempted on both Firefox and Google Chrome.

Python run on both PC and VM Linux through Anaconda.

I can confirm similar behaviour with this URL)&only-with-climatology=&download-timeseries=Timeseries&data-format=nc) (obtained through the UI following Aidan's filters):

$ cd /tmp
james@basalt:/tmp$ wget 'https://services.pacificclimate.org/met-data-portal-pcds/api/data//pcds/agg/?from-date=&to-date=&network-name=&input-vars=lwe_thickness_of_precipitation_amountt%3A%20sum%20over%20days%20interval%3A%20irregular%2Clwe_thickness_of_precipitation_sum%2Clwe_thickness_of_precipitation_amount_sum%2Clwe_thickness_of_precipitation_amountt%3A%20sum%20within%20days%20interval%3A%20daily%2Clwe_thickness_of_precipitation_amount_sum%20(interval%3A%201%20hour)%2Clwe_thickness_of_precipitation_amountt%3A%20sum%20within%20months%20t%3A%20mean%20over%20years%2Cthickness_of_rainfall_amount_sum%20(interval%3A%201%20hour)%2Cthickness_of_rainfall_amount_sum%20(interval%3A%2024%20hour)%2Cthickness_of_rainfall_amount_sum%2Cfoo_bar_mean%2Csnowfall_amount_sum%2Csnowfall_amount_sum%20(interval%3A%201%20hour)%2Cthickness_of_snowfall_amount_sum%2Csurface_snow_thickness_point&input-freq=1-hourly&input-polygon=MULTIPOLYGON%20(((-123.623568%2048.922547%2C%20-123.623568%2049.575462%2C%20-121.884229%2049.575462%2C%20-121.884229%2048.922547%2C%20-123.623568%2048.922547)))&only-with-climatology=&download-timeseries=Timeseries&data-format=nc'
The destination name is too long (924), reducing to 236
--2023-06-12 15:38:36--  https://services.pacificclimate.org/met-data-portal-pcds/api/data//pcds/agg/?from-date=&to-date=&network-name=&input-vars=lwe_thickness_of_precipitation_amountt%3A%20sum%20over%20days%20interval%3A%20irregular%2Clwe_thickness_of_precipitation_sum%2Clwe_thickness_of_precipitation_amount_sum%2Clwe_thickness_of_precipitation_amountt%3A%20sum%20within%20days%20interval%3A%20daily%2Clwe_thickness_of_precipitation_amount_sum%20(interval%3A%201%20hour)%2Clwe_thickness_of_precipitation_amountt%3A%20sum%20within%20months%20t%3A%20mean%20over%20years%2Cthickness_of_rainfall_amount_sum%20(interval%3A%201%20hour)%2Cthickness_of_rainfall_amount_sum%20(interval%3A%2024%20hour)%2Cthickness_of_rainfall_amount_sum%2Cfoo_bar_mean%2Csnowfall_amount_sum%2Csnowfall_amount_sum%20(interval%3A%201%20hour)%2Cthickness_of_snowfall_amount_sum%2Csurface_snow_thickness_point&input-freq=1-hourly&input-polygon=MULTIPOLYGON%20(((-123.623568%2048.922547%2C%20-123.623568%2049.575462%2C%20-121.884229%2049.575462%2C%20-121.884229%2048.922547%2C%20-123.623568%2048.922547)))&only-with-climatology=&download-timeseries=Timeseries&data-format=nc
Resolving services.pacificclimate.org (services.pacificclimate.org)... 142.104.171.130
Connecting to services.pacificclimate.org (services.pacificclimate.org)|142.104.171.130|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/zip]
Saving to: ‘index.html?from-date=&to-date=&network-name=&input-vars=lwe_thickness_of_precipitation_amountt: sum over days interval: irregular,lwe_thickness_of_precipitation_sum,lwe_thickness_of_precipitation_amount_sum,lwe_thickness_of_precipitatio’

index.html?from-date=&to-date=&net     [                                                 <=>             ]  13.40M  94.9KB/s    in 2m 30s  

2023-06-12 15:41:57 (91.7 KB/s) - Read error at byte 14047730 (Success).Retrying.

--2023-06-12 15:41:58--  (try: 2)  https://services.pacificclimate.org/met-data-portal-pcds/api/data//pcds/agg/?from-date=&to-date=&network-name=&input-vars=lwe_thickness_of_precipitation_amountt%3A%20sum%20over%20days%20interval%3A%20irregular%2Clwe_thickness_of_precipitation_sum%2Clwe_thickness_of_precipitation_amount_sum%2Clwe_thickness_of_precipitation_amountt%3A%20sum%20within%20days%20interval%3A%20daily%2Clwe_thickness_of_precipitation_amount_sum%20(interval%3A%201%20hour)%2Clwe_thickness_of_precipitation_amountt%3A%20sum%20within%20months%20t%3A%20mean%20over%20years%2Cthickness_of_rainfall_amount_sum%20(interval%3A%201%20hour)%2Cthickness_of_rainfall_amount_sum%20(interval%3A%2024%20hour)%2Cthickness_of_rainfall_amount_sum%2Cfoo_bar_mean%2Csnowfall_amount_sum%2Csnowfall_amount_sum%20(interval%3A%201%20hour)%2Cthickness_of_snowfall_amount_sum%2Csurface_snow_thickness_point&input-freq=1-hourly&input-polygon=MULTIPOLYGON%20(((-123.623568%2048.922547%2C%20-123.623568%2049.575462%2C%20-121.884229%2049.575462%2C%20-121.884229%2048.922547%2C%20-123.623568%2048.922547)))&only-with-climatology=&download-timeseries=Timeseries&data-format=nc
Connecting to services.pacificclimate.org (services.pacificclimate.org)|142.104.171.130|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/zip]
Saving to: ‘index.html?from-date=&to-date=&network-name=&input-vars=lwe_thickness_of_precipitation_amountt: sum over days interval: irregular,lwe_thickness_of_precipitation_sum,lwe_thickness_of_precipitation_amount_sum,lwe_thickness_of_precipitatio’

index.html?from-date=&to-date=&net     [                                                      <=>        ]  13.92M  43.7KB/s    in 3m 49s  

2023-06-12 15:46:32 (62.2 KB/s) - Read error at byte 14592490 (Success).Retrying.
...
rod-glover commented 1 year ago

An exampel download of many observations yielded an interesting result.

Summary: Error is OperationalError: (psycopg2.extensions.QueryCanceledError) canceling statement due to statement timeout

Select station near Vernon (lower left in image below; selected by rectangle): image

Here's (part of) the metadata: image

The resulting download is a zip file with the following contents: image

Extracting the .nc file shows rather interesting (and not .nc type) contents:

$ cat ~/Downloads/23097.nc 
Error {
    code = -1;
    message = "Traceback (most recent call last):
  File \"/usr/local/lib/python2.7/dist-packages/pydap/handlers/lib.py\", line 73, in __call__
    app = self.responses[response](dataset)
  File \"/usr/local/lib/python2.7/dist-packages/pydap/responses/netcdf/__init__.py\", line 58, in __init__
    n = len(seq)
  File \"/usr/local/lib/python2.7/dist-packages/pydap/model.py\", line 435, in __len__
    def __len__(self): return len(self.data)
  File \"/usr/local/lib/python2.7/dist-packages/pydap/handlers/sql/__init__.py\", line 316, in __len__
    data = conn.execute(*self.query)
  File \"/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py\", line 1154, in execute
    bind, close_with_result=True).execute(clause, params or {})
  File \"/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py\", line 945, in execute
    return meth(self, multiparams, params)
  File \"/usr/local/lib/python2.7/dist-packages/sqlalchemy/sql/elements.py\", line 263, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File \"/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py\", line 1053, in _execute_clauseelement
    compiled_sql, distilled_params
  File \"/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py\", line 1189, in _execute_context
    context)
  File \"/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py\", line 1402, in _handle_dbapi_exception
    exc_info
  File \"/usr/local/lib/python2.7/dist-packages/sqlalchemy/util/compat.py\", line 203, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File \"/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py\", line 1182, in _execute_context
    context)
  File \"/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/default.py\", line 470, in do_execute
    cursor.execute(statement, parameters)
OperationalError: (psycopg2.extensions.QueryCanceledError) canceling statement due to statement timeout
 [SQL: 'SELECT RELATIVE_HUMIDITY1, STANDARD_SNOW, MINIMUM_AIR_TEMPERATURE, max_wnd_spd_10m_pst1hr, MEASURED_WIND_SPEED1, MEASURED_WIND_DIRECTION1, dwpt_temp, min_air_temp_snc_last_reset, DEW_POINT, ATMOSPHERIC_PRESSURE, avg_wnd_dir_10m_pst10mts, ACTUAL_WIND_DIRECTION, MAXIMUM_AIR_TEMPERATURE, snw_dpth, MAXIMUM_MEASURED_WIND_SPEED1, HEIGHT_OF_SNOW, pcpn_amt_pst24hrs, CURRENT_AIR_TEMPERATURE2, CURRENT_AIR_TEMPERATURE1, PRECIPITATION_NEW, max_air_temp_snc_last_reset, PRECIP_DETECTOR_RATIO, ACTUAL_WIND_SPEED, WIND_DIRECTION_STD_DEVIATION1, avg_wnd_spd_10m_pst10mts, air_temp, mslp, pcpn_amt_pst1hr, snwfl_amt_pst1hr, rel_hum, obs_time, stn_pres FROM (SELECT obs_time,MAX(CASE WHEN vars_id=445 THEN datum END) as ACTUAL_WIND_DIRECTION,MAX(CASE WHEN vars_id=444 THEN datum END) as ACTUAL_WIND_SPEED,MAX(CASE WHEN vars_id=721 THEN datum END) as air_temp,MAX(CASE WHEN vars_id=439 THEN datum END) as ATMOSPHERIC_PRESSURE,MAX(CASE WHEN vars_id=731 THEN datum END) as avg_wnd_dir_10m_pst10mts,MAX(CASE WHEN vars_id=729 THEN datum END) as avg_wnd_spd_10m_pst10mts,MAX(CASE WHEN vars_id=434 THEN datum END) as CURRENT_AIR_TEMPERATURE1,MAX(CASE WHEN vars_id=436 THEN datum END) as CURRENT_AIR_TEMPERATURE2,MAX(CASE WHEN vars_id=438 THEN datum END) as DEW_POINT,MAX(CASE WHEN vars_id=724 THEN datum END) as dwpt_temp,MAX(CASE WHEN vars_id=450 THEN datum END) as HEIGHT_OF_SNOW,MAX(CASE WHEN vars_id=720 THEN datum END) as max_air_temp_snc_last_reset,MAX(CASE WHEN vars_id=433 THEN datum END) as MAXIMUM_AIR_TEMPERATURE,MAX(CASE WHEN vars_id=449 THEN datum END) as MAXIMUM_MEASURED_WIND_SPEED1,MAX(CASE WHEN vars_id=730 THEN datum END) as max_wnd_spd_10m_pst1hr,MAX(CASE WHEN vars_id=447 THEN datum END) as MEASURED_WIND_DIRECTION1,MAX(CASE WHEN vars_id=446 THEN datum END) as MEASURED_WIND_SPEED1,MAX(CASE WHEN vars_id=722 THEN datum END) as min_air_temp_snc_last_reset,MAX(CASE WHEN vars_id=435 THEN datum END) as MINIMUM_AIR_TEMPERATURE,MAX(CASE WHEN vars_id=733 THEN datum END) as mslp,MAX(CASE WHEN vars_id=726 THEN datum END) as pcpn_amt_pst1hr,MAX(CASE WHEN vars_id=732 THEN datum END) as pcpn_amt_pst24hrs,MAX(CASE WHEN vars_id=443 THEN datum END) as PRECIP_DETECTOR_RATIO,MAX(CASE WHEN vars_id=441 THEN datum END) as PRECIPITATION_NEW,MAX(CASE WHEN vars_id=437 THEN datum END) as RELATIVE_HUMIDITY1,MAX(CASE WHEN vars_id=723 THEN datum END) as rel_hum,MAX(CASE WHEN vars_id=727 THEN datum END) as snw_dpth,MAX(CASE WHEN vars_id=728 THEN datum END) as snwfl_amt_pst1hr,MAX(CASE WHEN vars_id=451 THEN datum END) as STANDARD_SNOW,MAX(CASE WHEN vars_id=725 THEN datum END) as stn_pres,MAX(CASE WHEN vars_id=448 THEN datum END) as WIND_DIRECTION_STD_DEVIATION1 from obs_raw WHERE (history_id = 3433) AND vars_id IN (445,444,721,439,731,729,434,436,438,724,450,720,433,449,730,447,446,722,435,733,726,732,443,441,437,723,727,728,451,725,448) GROUP BY obs_time ORDER BY obs_time) as foo   LIMIT 9223372036854775807 OFFSET 0']
rod-glover commented 1 year ago

Aside: What's with the double // in the URL? https://services.pacificclimate.org/met-data-portal-pcds/api/data//pcds/......

rod-glover commented 1 year ago

The obvious suspicion is that the database connection is timing out, as we saw for crmprtd in March.

Therefore EY deployed a dev instance with a db connection string with keepalives, thus:

$ cat be-data.env 
PCDS_DSN=postgresql://httpd:PASSWORD@db.pcic.uvic.ca/crmp?keepalives=1&keepalives_idle=300&keepalives_interval=300&keepalives_count=9
...

Here's a try with curl:

$ curl "https://services.pacificclimate.org/dev/met-data-portal-pcds/api/data//pcds/agg/?from-date=&to-date=&network-name=&input-vars=&input-freq=&input-polygon=MULTIPOLYGON%20(((-119.308252%2050.209165%2C%20-119.308252%2050.21286%2C%20-119.304226%2050.21286%2C%20-119.304226%2050.209165%2C%20-119.308252%2050.209165)))&only-with-climatology=&download-timeseries=Timeseries&data-format=nc" -o test-download.zip
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   691    0   691    0     0     13      0 --:--:--  0:00:50 --:--:--     0
curl: (18) transfer closed with outstanding read data remaining

The downloaded file is gobbledeygook as text, and not recognized as a zip file or as a .nc file.

rod-glover commented 1 year ago

That 691 byte file size seems to crop up frequently when there is a problem. But that does not seem to advance our investigation here.

rod-glover commented 1 year ago

A very similar result is obtained with curl on the URL provided by the production app:

$ curl "https://services.pacificclimate.org/met-data-portal-pcds/api/data//pcds/agg/?from-date=&to-date=&network-name=&input-vars=&input-freq=&input-polygon=MULTIPOLYGON%20(((-119.314056%2050.206113%2C%20-119.314056%2050.215456%2C%20-119.296609%2050.215456%2C%20-119.296609%2050.206113%2C%20-119.314056%2050.206113)))&only-with-climatology=&download-timeseries=Timeseries&data-format=nc" -o test-download-prod.zip
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   691    0   691    0     0     13      0 --:--:--  0:00:50 --:--:--     

Note identical size and time. These are probably clues.

But the results are not quite identical:

$ cmp test-download.zip test-download-prod.zip 
test-download.zip test-download-prod.zip differ: byte 11, line 1
rod-glover commented 1 year ago

The results are different when the Download Timeseries button is clicked in the prod and dev apps.

In the prod app, we get the result shown above (with the screenshots).

In the dev app we get the result shown below (top right); no file is actually downloaded, so its contents cannot be checked. image

jameshiebert commented 1 year ago

That SQL query that you show above takes nearly 30 seconds to run, so a timeout issue sounds probably.

crmp=> SELECT RELATIVE_HUMIDITY1, ...

Time: 27513.105 ms (00:27.513)

Our production deployment does not have keepalives set in the database DSN...

rod-glover commented 1 year ago

Yes, but the dev deployment does, and it misbehaves too. :(

But it may be misbehaving differently. It's hard to tell. Still looking. Need a brief walk to clear head.

rod-glover commented 1 year ago

Possibly a very significant finding. I began to suspect the exact 50-s fail time on each download. So I thought, maybe HAProxy is doing something? As a test, I tried going direct to the server and it downloaded a good file with very plausible contents:

l$ curl --max-time 900 --connect-timeout 300 "http://docker-prod02.pcic.uvic.ca:30503/api/data/pcds/agg/?from-date=&to-date=&network-name=&input-vars=&input-freq=&input-polygon=MULTIPOLYGON%20(((-119.314056%2050.206113%2C%20-119.314056%2050.215456%2C%20-119.296609%2050.215456%2C%20-119.296609%2050.206113%2C%20-119.314056%2050.206113)))&only-with-climatology=&download-timeseries=Timeseries&data-format=nc" -o test-download-prod.zip
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 7257k    0 7257k    0     0  83184      0 --:--:--  0:01:29 --:--:-- 1797k

Contents of downloaded file: image

I think the--max-time and --connect-timeout options are unnecessary, but they are leftover from a try against the public URL as follows:

$ curl --max-time 900 --connect-timeout 300 "https://services.pacificclimate.org/met-data-portal-pcds/api/data/pcds/agg/?from-date=&to-date=&network-name=&input-vars=&input-freq=&input-polygon=MULTIPOLYGON%20(((-119.314056%2050.206113%2C%20-119.314056%2050.215456%2C%20-119.296609%2050.215456%2C%20-119.296609%2050.206113%2C%20-119.314056%2050.206113)))&only-with-climatology=&download-timeseries=Timeseries&data-format=nc" -o test-download-prod.zip
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   691    0   691    0     0     13      0 --:--:--  0:00:50 --:--:--     0
curl: (18) transfer closed with outstanding read data remaining
rod-glover commented 1 year ago

The problem was not the database connection proper timing out. It was, as I hinted at above, that HAProxy has a default connection (different sense of that word) timeout of 50 s. For large downloads from Met Data Portal there is a gap, presumably while it marshals and formats the data, that is longer than 50 s.

Therefore I asked Tom to increase the timeout on the prod and dev MDP data backends to 5 min. This fixed the problem, which can be observed with both curl and by a live test with the app.

No change to code or deployment, only to HAProxy configuration. For those interested, Tom says:

[root@proxy01 haproxy]# diff haproxy.cfg haproxy.cfg.20230614.tk
677,680d676
< # hack for https://github.com/pacificclimate/pdp/issues/294#issuecomment-1592098943
<   timeout server 5m
<   timeout client 5m
< # end hack
686,689d681
< # hack for https://github.com/pacificclimate/pdp/issues/294#issuecomment-1592098943
<   timeout server 5m
<   timeout client 5m
< # end hack

I'm not sure I'd call that a "hack", but de gustibus ...

Thanks, Tom, for your help!