pacificclimate / modelmeta

An ORM representation of the model metadata database
GNU General Public License v3.0
1 stars 0 forks source link

index_netcdf should detect fully masked files #58

Open jameshiebert opened 6 years ago

jameshiebert commented 6 years ago

When attempting to index some climdex climatologies, many of the files are erroring out with a message resembling the following:

2018-02-20 00:00:32 INFO: Processing file: /storage/data/projects/comp_support/climate_explorer_data_prep/climatological_means/climdex/txnETCCDI_aClim_BCCAQ_HadGEM2-CC_historical+rcp85_r1i1p1_20400101-20691230.nc
2018-02-20 00:00:32 INFO: Creating new DataFile for unique_id txnETCCDI_aClim_BCCAQ_HadGEM2-CC_historical-rcp85_r1i1p1_20400101-20691230
2018-02-20 00:00:33 ERROR: Traceback (most recent call last):
  File "/storage/home/hiebert/code/git/modelmeta/env/lib64/python3.4/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
    context)
  File "/storage/home/hiebert/code/git/modelmeta/env/lib64/python3.4/site-packages/sqlalchemy/engine/default.py", line 507, in do_execute
    cursor.execute(statement, parameters)
psycopg2.ProgrammingError: can't adapt type 'MaskedConstant'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/storage/home/hiebert/code/git/modelmeta/mm_cataloguer/index_netcdf.py", line 1060, in index_netcdf_file
    session.commit()
  File "/storage/home/hiebert/code/git/modelmeta/env/lib64/python3.4/site-packages/sqlalchemy/orm/session.py", line 943, in commit
    self.transaction.commit()
  File "/storage/home/hiebert/code/git/modelmeta/env/lib64/python3.4/site-packages/sqlalchemy/orm/session.py", line 467, in commit
    self._prepare_impl()
  File "/storage/home/hiebert/code/git/modelmeta/env/lib64/python3.4/site-packages/sqlalchemy/orm/session.py", line 447, in _prepare_impl
    self.session.flush()
  File "/storage/home/hiebert/code/git/modelmeta/env/lib64/python3.4/site-packages/sqlalchemy/orm/session.py", line 2243, in flush
    self._flush(objects)
  File "/storage/home/hiebert/code/git/modelmeta/env/lib64/python3.4/site-packages/sqlalchemy/orm/session.py", line 2369, in _flush
    transaction.rollback(_capture_exception=True)
  File "/storage/home/hiebert/code/git/modelmeta/env/lib64/python3.4/site-packages/sqlalchemy/util/langhelpers.py", line 66, in __exit__
    compat.reraise(exc_type, exc_value, exc_tb)
  File "/storage/home/hiebert/code/git/modelmeta/env/lib64/python3.4/site-packages/sqlalchemy/util/compat.py", line 187, in reraise
    raise value
  File "/storage/home/hiebert/code/git/modelmeta/env/lib64/python3.4/site-packages/sqlalchemy/orm/session.py", line 2333, in _flush
    flush_context.execute()
  File "/storage/home/hiebert/code/git/modelmeta/env/lib64/python3.4/site-packages/sqlalchemy/orm/unitofwork.py", line 391, in execute
    rec.execute(self)
  File "/storage/home/hiebert/code/git/modelmeta/env/lib64/python3.4/site-packages/sqlalchemy/orm/unitofwork.py", line 556, in execute
    uow
  File "/storage/home/hiebert/code/git/modelmeta/env/lib64/python3.4/site-packages/sqlalchemy/orm/persistence.py", line 181, in save_obj
    mapper, table, insert)
  File "/storage/home/hiebert/code/git/modelmeta/env/lib64/python3.4/site-packages/sqlalchemy/orm/persistence.py", line 866, in _emit_insert_statements
    execute(statement, params)
  File "/storage/home/hiebert/code/git/modelmeta/env/lib64/python3.4/site-packages/sqlalchemy/engine/base.py", line 948, in execute
    return meth(self, multiparams, params)
  File "/storage/home/hiebert/code/git/modelmeta/env/lib64/python3.4/site-packages/sqlalchemy/sql/elements.py", line 269, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/storage/home/hiebert/code/git/modelmeta/env/lib64/python3.4/site-packages/sqlalchemy/engine/base.py", line 1060, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/storage/home/hiebert/code/git/modelmeta/env/lib64/python3.4/site-packages/sqlalchemy/engine/base.py", line 1200, in _execute_context
    context)
  File "/storage/home/hiebert/code/git/modelmeta/env/lib64/python3.4/site-packages/sqlalchemy/engine/base.py", line 1413, in _handle_dbapi_exception
    exc_info
  File "/storage/home/hiebert/code/git/modelmeta/env/lib64/python3.4/site-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/storage/home/hiebert/code/git/modelmeta/env/lib64/python3.4/site-packages/sqlalchemy/util/compat.py", line 186, in reraise
    raise value.with_traceback(tb)
  File "/storage/home/hiebert/code/git/modelmeta/env/lib64/python3.4/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
    context)
  File "/storage/home/hiebert/code/git/modelmeta/env/lib64/python3.4/site-packages/sqlalchemy/engine/default.py", line 507, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) can't adapt type 'MaskedConstant' [SQL: 'INSERT INTO data_file_variables (derivation_method, variable_cell_methods, netcdf_variable_name, disabled, range_min, range_max, data_file_id, variable_alias_id, level_set_id, grid_id) VALUES (%(derivation_method)s, %(variable_cell_methods)s, %(netcdf_variable_name)s, %(disabled)s, %(range_min)s, %(range_max)s, %(data_file_id)s, %(variable_alias_id)s, %(level_set_id)s, %(grid_id)s) RETURNING data_file_variables.data_file_variable_id'] [parameters: {'grid_id': 18, 'netcdf_variable_name': 'txnETCCDI', 'range_min': masked, 'variable_cell_methods': 'time: maximum', 'range_max': masked, 'variable_alias_id': 60, 'data_file_id': 10412, 'level_set_id': None, 'derivation_method': None, 'disabled': False}] (Background on this error at: http://sqlalche.me/e/f405)

I'm not sure why, but it's trying to do an insertion where the range_min and range_max columns are both "masked".

rod-glover commented 6 years ago

The last time this happened, it was due to input data files (in that case, downscaled GCM outputs) containing masked values. According to Stephen Sobie, that was not supposed to happen, and he promised to correct those files. Consequently, I did not update the code to handle masked values.

This case is occurring with CLIMDEX files, and I am not sure whether it is legitimate for them to contain masked values. That's the first question (assuming I am right about the root cause.)

jameshiebert commented 6 years ago

Hmm, yeah, now that you mention it the input data looks like this:

txnETCCDI =
  2.945782e+34, 2.945782e+34, 2.945782e+34, 2.945782e+34, 2.945782e+34, 
    2.945782e+34, 2.945782e+34, 2.945782e+34, 2.945782e+34, 2.945782e+34, 
    2.945782e+34, 2.945782e+34, 2.945782e+34, 2.945782e+34, 2.945782e+34, 
    2.945782e+34, 2.945782e+34, 2.945782e+34, 2.945782e+34, 2.945782e+34, 
    2.945782e+34, 2.945782e+34, 2.945782e+34, 2.945782e+34, 2.945782e+34, 
    2.945782e+34, 2.945782e+34, 2.945782e+34, 2.945782e+34, 2.945782e+34, 
    2.945782e+34, 2.945782e+34, 2.945782e+34, 2.945782e+34, 2.945782e+34, 
    2.945782e+34, 2.945782e+34, 2.945782e+34, 2.945782e+34, 2.945782e+34, 
    2.945782e+34, 2.945782e+34, 2.945782e+34, 2.945782e+34, 2.945782e+34, 
    2.945782e+34, 2.945782e+34, 2.945782e+34, 2.945782e+34, 2.945782e+34, 
    2.945782e+34, 2.945782e+34, 2.945782e+34, 2.945782e+34, 2.945782e+34, 

So it's not so much that it contains masked data (which is completely valid, BTW. We have to mask out the oceans all the time). It's that it contains no data. I'll have to drop back a step in the pipeline and see why generate_climos actually created this.

jameshiebert commented 6 years ago

Would be nice to have a graceful failure in these circumstances, since there seem to be a lot of them.

marshallward commented 5 years ago

I apologise for barging into this issue, but we are getting these exact values (2.945782e+34) in masked regions (of a presumably unrelated model, I should add), even though our fill value is 1e20. They appear when we do a parallel IO write, and appear in regions where are not writing.

Apologies again for barging, just wondering if anyone knows where these values are coming from.


Edit: In case some lost soul find this... turns out 2.94e34 is the single-precision float from the lower 32 bits of the double-precision value 1e20, which is the CMOR defined fill value.