We have been seeing some strange errors in the PDP backend, where the response tries to put together a station table and is missing a variable that it thinks should be there.
The station in question appears to be from the BC Hydro (BCH) network, station id "AFU".
and it's failing on a wind variable. The error shows up in the logs as follows:
2021-02-01 18:49:23 [15] [ERROR] Error handling request
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/gunicorn/workers/async.py", line 114, in handle_request
for item in respiter:
File "/root/pdp/pdp/error.py", line 74, in __call__
start_response(status, response_headers, sys.exc_info())
File "/usr/local/lib/python2.7/dist-packages/gunicorn/http/wsgi.py", line 237, in start_response
reraise(exc_info[0], exc_info[1], exc_info[2])
File "/root/pdp/pdp/error.py", line 68, in __call__
for block in response_iter:
File "/usr/local/lib/python2.7/dist-packages/ga_wsgi_client/__init__.py", line 41, in __call__
for chunk in app_iter:
Exception ValueError: 'I/O operation on closed file' in <bound method ZipFile.__del__ of <zipfile.ZipFile object at 0x7f9628d47e10>> ignored
File "/usr/local/lib/python2.7/dist-packages/pdp_util/agg.py", line 76, in ziperator
for name, responder in responders:
File "/usr/local/lib/python2.7/dist-packages/pdp_util/agg.py", line 161, in get_pcds_responders
yield (name, handler(newenv, null_start_response))
File "/usr/local/lib/python2.7/dist-packages/pydap/handlers/pcic/__init__.py", line 78, in __call__
app = SQLHandler(f.name)
File "/usr/local/lib/python2.7/dist-packages/pydap/handlers/sql/__init__.py", line 133, in __init__
results = conn.execute(query)
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 1154, in execute
bind, close_with_result=True).execute(clause, params or {})
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 945, in execute
return meth(self, multiparams, params)
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/sql/elements.py", line 263, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1053, in _execute_clauseelement
compiled_sql, distilled_params
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1189, in _execute_context
context)
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1402, in _handle_dbapi_exception
exc_info
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb, cause=cause)
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context
context)
File "/usr/local/lib/python2.7/dist-packages/sqlalchemy/engine/default.py", line 470, in do_execute
cursor.execute(statement, parameters)
ProgrammingError: (psycopg2.ProgrammingError) column "wind" does not exist
LINE 1: ..._Avg, ONE_DAY_SNOW, K_up_Avg, L_up_corr_Avg, Vis, Wind, AirT...
^
SELECT L_down_corr_Avg, HFT3_1_Avg, L_up_Avg, ONE_DAY_SNOW, K_up_Avg, L_up_corr_Avg, Vis, Wind, AirTemp, BAR_PRESS, L_down_Avg, HFT3_2_Avg, Wetness, ONE_DAY_RAIN, HUMIDITY, K_down_Avg, TSoil_Avg, obs_time, WindDirection FROM (SELECT obs_time,MAX(CASE WHEN vars_id=657 THEN datum END) as AirTemp,MAX(CASE WHEN vars_id=672 THEN datum END) as HFT3_1_Avg,MAX(CASE WHEN vars_id=673 THEN datum END) as HFT3_2_Avg,MAX(CASE WHEN vars_id=679 THEN datum END) as K_down_Avg,MAX(CASE WHEN vars_id=678 THEN datum END) as K_up_Avg,MAX(CASE WHEN vars_id=674 THEN datum END) as L_down_Avg,MAX(CASE WHEN vars_id=675 THEN datum END) as L_down_corr_Avg,MAX(CASE WHEN vars_id=676 THEN datum END) as L_up_Avg,MAX(CASE WHEN vars_id=677 THEN datum END) as L_up_corr_Avg,MAX(CASE WHEN vars_id=464 THEN datum END) as MAX_TEMP,MAX(CASE WHEN vars_id=463 THEN datum END) as MIN_TEMP,MAX(CASE WHEN vars_id=681 THEN datum END) as NetRad,MAX(CASE WHEN vars_id=465 THEN datum END) as ONE_DAY_PRECIPITATION,MAX(CASE WHEN vars_id=466 THEN datum END) as ONE_DAY_RAIN,MAX(CASE WHEN vars_id=467 THEN datum END) as ONE_DAY_SNOW,MAX(CASE WHEN vars_id=468 THEN datum END) as SNOW_ON_THE_GROUND,MAX(CASE WHEN vars_id=659 THEN datum END) as Snow_WE,MAX(CASE WHEN vars_id=671 THEN datum END) as TSoil_Avg,MAX(CASE WHEN vars_id=680 THEN datum END) as Vis from obs_raw WHERE (history_id = 14398) AND vars_id IN (657,672,673,679,678,674,675,676,677,464,463,681,465,466,467,468,659,671,680) GROUP BY obs_time ORDER BY obs_time) as foo LIMIT 1
The problem appears to be that the code is composing the list of variables in two different ways, and they disagree.
The first method is where pydap.handlers.pcic attempts to present the dataset to Pydap, enumerating the variables that are available. It does so in the get_vars() function here. Calling this with the station_id in question produces a query and result as follows;
2021-02-01 15:05:00,968 INFO sqlalchemy.engine.base.Engine SELECT crmp.meta_vars.vars_id AS crmp_meta_vars_vars_id, crmp.meta_vars.net_var_name AS crmp_meta_vars_net_var_name, crmp.meta_vars.long_description AS crmp_meta_vars_long_description, crmp.meta_vars.unit AS crmp_meta_vars_unit, crmp.meta_vars.standard_name AS crmp_meta_vars_standard_name, crmp.meta_vars.cell_method AS crmp_meta_vars_cell_method, crmp.meta_vars.precision AS crmp_meta_vars_precision, crmp.meta_vars.display_name AS crmp_meta_vars_display_name, crmp.meta_vars.short_name AS crmp_meta_vars_short_name, crmp.meta_vars.network_id AS crmp_meta_vars_network_id
FROM crmp.meta_vars JOIN crmp.vars_per_history_mv ON crmp.meta_vars.vars_id = crmp.vars_per_history_mv.vars_id JOIN crmp.meta_history ON crmp.meta_history.history_id = crmp.vars_per_history_mv.history_id JOIN crmp.meta_station ON crmp.meta_station.station_id = crmp.meta_history.station_id JOIN crmp.meta_network ON crmp.meta_network.network_id = crmp.meta_station.network_id
WHERE crmp.meta_station.station_id = %(station_id_1)s AND NOT (crmp.meta_vars.cell_method LIKE %(cell_method_1)s OR crmp.meta_vars.cell_method LIKE %(cell_method_2)s)
2021-02-01 15:05:00,968 INFO sqlalchemy.engine.base.Engine {'station_id_1': 12342, 'cell_method_1': '%within%', 'cell_method_2': '%over%'}
[(u'HFT3_1_Avg', u'W m-2', u'downward_heatflux_in_soil', u'time: mean', u'soil heat flux ( 1 of 2)\n', u'Soil Heatflux 1'), (u'L_down_Avg', u'W m-2', u'downwelling_longwave_flux_in_air', u'time: mean', u'downwelling longwave radiation', u'Downwelling Longwave Radiation'), (u'ONE_DAY_RAIN', u'mm', u'thickness_of_rainfall_amount', u'time: sum', u'Daily rainfall', u'Rainfall Amount'), (u'AirTemp', u'celsius', u'air_temperature', u'time: point', u'Hourly air temperature instantaneous', u'Temperature (Point)'), (u'K_down_Avg', u'W m-2', u'downwelling_shortwave_flux_in_air', u'time: mean', u'downwelling shortwave radiation', u'Downwelling Shortwave Radiation'), (u'WindDirection', u'degree', u'wind_from_direction', u'time: mean', None, u'Wind Direction (Mean)'), (u'HFT3_2_Avg', u'W m-2', u'downward_heatflux_in_soil', u'time: mean', u'soil heat flux ( 2 of 2)\n', u'Soil Heatflux 2'), (u'Wetness', u'%', u'volume_fraction_of_condensed_water_in_soil', u'time: point', None, u'Soil Moisture'), (u'BAR_PRESS', u'millibar', u'air_pressure', u'time: point', u'Atmospheric pressure', u'Air Pressure (Point)'), (u'TSoil_Avg', u'celsius', u'soil_temperature', u'time: mean', u'Soil Temperature', u'Soil Temperature (Mean)'), (u'L_up_Avg', u'W m-2', u'upwelling_longwave_flux_in_air', u'time: mean', u'upwelling longwave radiation', u'Upwelling Longwave Radiation'), (u'L_down_corr_Avg', u'W m-2', u'downwelling_longwave_flux_in_air', u'time: mean', u'downwelling longwave radiation (temperature corrected)', u'Downwelling Longwave Radiation - temperature corrected'), (u'L_up_corr_Avg', u'W m-2', u'upwelling_longwave_flux_in_air', u'time: mean', u'upwelling longwave radiation (temperature corrected)', u'Upwelling Longwave Radiation - temperature corrected'), (u'HUMIDITY', u'%', u'relative_humidity', u'time: point', u'Relative humidity', u'Relative Humidity (Point)'), (u'K_up_Avg', u'W m-2', u'upwelling_shortwave_flux_in_air', u'time: mean', u'upwelling shortwave radiation', u'Upwelling Shortwave Radiation'), (u'Vis', u'km', u'visibility_in_air', u'time: point', u'Visibility', u'Visibility'), (u'Wind', u'm s-1', u'wind_speed', u'time: mean', u'Wind Speed', u'Wind Speed'), (u'ONE_DAY_SNOW', u'cm', u'thickness_of_snowfall_amount', u'time: sum', u'Daily snow accumulation', u'Snowfall Amount')]
In short, it uses the vars_per_history_mv materialzed view/table and JOINs on it using left join with an explicit join condition. Several key problematic variables are included in the results: e.g. Wind, WindDirection, and HUMIDITY.
Compare that to the variable list from the SQL that generates the "station table" (a query that produces one row per time step, one column per variable of observation):
crmp=> select query_one_station(12342);
SELECT obs_time,MAX(CASE WHEN vars_id=657 THEN datum END) as AirTemp,MAX(CASE WHEN vars_id=672 THEN datum END) as HFT3_1_Avg,MAX(CASE WHEN vars_id=673 THEN datum END) as HFT3_2_Avg,MAX(CASE WHEN vars_id=679 THEN datum END) as K_down_Avg,MAX(CASE WHEN vars_id=678 THEN datum END) as K_up_Avg,MAX(CASE WHEN vars_id=674 THEN datum END) as L_down_Avg,MAX(CASE WHEN vars_id=675 THEN datum END) as L_down_corr_Avg,MAX(CASE WHEN vars_id=676 THEN datum END) as L_up_Avg,MAX(CASE WHEN vars_id=677 THEN datum END) as L_up_corr_Avg,MAX(CASE WHEN vars_id=464 THEN datum END) as MAX_TEMP,MAX(CASE WHEN vars_id=463 THEN datum END) as MIN_TEMP,MAX(CASE WHEN vars_id=681 THEN datum END) as NetRad,MAX(CASE WHEN vars_id=465 THEN datum END) as ONE_DAY_PRECIPITATION,MAX(CASE WHEN vars_id=466 THEN datum END) as ONE_DAY_RAIN,MAX(CASE WHEN vars_id=467 THEN datum END) as ONE_DAY_SNOW,MAX(CASE WHEN vars_id=468 THEN datum END) as SNOW_ON_THE_GROUND,MAX(CASE WHEN vars_id=659 THEN datum END) as Snow_WE,MAX(CASE WHEN vars_id=671 THEN datum END) as TSoil_Avg,MAX(CASE WHEN vars_id=680 THEN datum END) as Vis from obs_raw WHERE (history_id = 14398) AND vars_id IN (657,672,673,679,678,674,675,676,677,464,463,681,465,466,467,468,659,671,680) GROUP BY obs_time ORDER BY obs_time
I'm not positive, but the code that generates that list of variables appears to be here in pydcs and it does a NATURAL JOIN from meta_station over to meta_vars which appears to work, however it completely leaves out the vars_per_history_mv.
Maybe there is some spurrious (incorrect) information in vars_per_history_mv? That appears to be the case, but it's unclear as to why:
We have been seeing some strange errors in the PDP backend, where the response tries to put together a station table and is missing a variable that it thinks should be there.
The station in question appears to be from the BC Hydro (BCH) network, station id "AFU".
and it's failing on a wind variable. The error shows up in the logs as follows:
The problem appears to be that the code is composing the list of variables in two different ways, and they disagree.
The first method is where
pydap.handlers.pcic
attempts to present the dataset to Pydap, enumerating the variables that are available. It does so in theget_vars()
function here. Calling this with the station_id in question produces a query and result as follows;In short, it uses the
vars_per_history_mv
materialzed view/table and JOINs on it using left join with an explicit join condition. Several key problematic variables are included in the results: e.g.Wind
,WindDirection
, andHUMIDITY
.Compare that to the variable list from the SQL that generates the "station table" (a query that produces one row per time step, one column per variable of observation):
I'm not positive, but the code that generates that list of variables appears to be here in
pydcs
and it does aNATURAL JOIN
frommeta_station
over tometa_vars
which appears to work, however it completely leaves out thevars_per_history_mv
.Maybe there is some spurrious (incorrect) information in
vars_per_history_mv
? That appears to be the case, but it's unclear as to why:More research necessary.