metno / pyaerocom

Python tools for climate and air quality model evaluation
https://pyaerocom.readthedocs.io/
GNU General Public License v3.0
25 stars 13 forks source link

aeroval: pure monthly processing crashes #479

Closed jgriesfeller closed 2 years ago

jgriesfeller commented 2 years ago

Background: in order to keep loading time of the aeroval web interface low, the trends processing should be limited to pure monthly processing right now. But that seems to fail. aeroval config file:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Config file for AeroCom PhaseIII optical properties experiment
"""
import os

### Define filters for the obs subsets

# BASE FILTERS
ALTITUDE_FILTER = {
    'altitude': [0, 1000]
}

# Setup for models used in analysis
MODELS = {
    'IFS-OSUITE': dict(model_id='ECMWF_OSUITE',
                       ),
}

# Setup for available ground based observations (ungridded)

AERONET_SITE_FILTER = dict(station_name='DRAGON*', negate='station_name')
OBS_GROUNDBASED = {

    'AeronetL1.5-d': dict(obs_id='AeronetSunV3Lev1.5.daily',
                          obs_vars=['od550aer'],
                          # obs_vars=['ang4487aer', 'od550aer'],
                          obs_vert_type='Column',
                          obs_filters={**ALTITUDE_FILTER,
                                       **AERONET_SITE_FILTER},
                          min_num_obs={'monthly': {'daily': 3}}),

}

# Setup for supported satellite evaluations
OBS_SAT = {}

OBS_CFG = {
    **OBS_GROUNDBASED,
    **OBS_SAT
}

DEFAULT_RESAMPLE_CONSTRAINTS = dict(
    monthly=dict(daily=21),
    daily=dict(hourly=18)
)

CFG = dict(

    model_cfg=MODELS,
    obs_cfg=OBS_CFG,

    json_basedir=os.path.abspath('../../data'),
    coldata_basedir=os.path.abspath('../../coldata'),
    io_aux_file=os.path.abspath('../eval_py/gridded_io_aux.py'),

    # if True, existing colocated data files will be deleted
    # reanalyse_existing=False,
    reanalyse_existing=True,
    only_json=False,
    add_model_maps=False,
    only_model_maps=False,

    clear_existing_json=False,

    # if True, the analysis will stop whenever an error occurs (else, errors that
    # occurred will be written into the logfiles)
    raise_exceptions=False,

    # Regional filter for analysis
    filter_name='WORLD-wMOUNTAINS',

    # colocation frequency (no statistics in higher resolution can be computed)
    # ts_type='daily',
    ts_type='monthly',
    map_zoom='World',

    # freqs=['daily', 'monthly'],
    freqs=['monthly'],
    periods=['2013-2020',],
    main_freq='monthly',
    # stats_main_freq = 'daily',
    # stats_tseries_base_freq="daily",
    stats_tseries_base_freq="monthly",
    zeros_to_nan=False,

    min_num_obs=DEFAULT_RESAMPLE_CONSTRAINTS,
    colocate_time=False,

    obs_remove_outliers=True,
    model_remove_outliers=False,
    harmonise_units=True,
    regions_how='htap',
    annual_stats_constrained=False,
    add_trends=True,
    # trends_min_yrs=3,

    proj_id='cams84',
    exp_id='eval-trends',
    exp_name='Trend evaluation of CAMS forecast and reanalysis models',
    exp_descr=('Both OSUITE and CNTRL are evaluated against multiple '
               'observation records including AOD from AERONET and '
               'PM, O3 and NO2 measurements.'),
    exp_pi='Jan Griesfeller (jan.griesfeller@met.no)',

    public=True,
    # directory where colocated data files are supposed to be stored
    weighted_stats=True,
)

if __name__ == '__main__':
    from pyaerocom.aeroval import EvalSetup, ExperimentProcessor
    import matplotlib.pyplot as plt

    plt.close('all')
    stp = EvalSetup(**CFG)

    ana = ExperimentProcessor(stp)
    print(stp)

    ana.exp_output.delete_experiment_data()
    res = ana.run()

Running resuts in this in current main-dev:

(pyaerocom-main-dev) jang@pc5378:~/.../pyaerocom_config/config_files$ python cfg_cams84_eval_trends.py
/home/jang/anaconda.2018/envs/pyaerocom-main-dev/lib/python3.9/site-packages/pyaerocom-0.12.0.dev2-py3.9.egg/pyaerocom/_lowlevel_helpers.py:675: RuntimeWarning: divide by zero encountered in log10
  ndigits = -1*np.floor(np.log10(abs(np.asarray(val)))).astype(int) + 2
/home/jang/anaconda.2018/envs/pyaerocom-main-dev/lib/python3.9/site-packages/pyaerocom-0.12.0.dev2-py3.9.egg/pyaerocom/_lowlevel_helpers.py:675: RuntimeWarning: overflow encountered in long_scalars
  ndigits = -1*np.floor(np.log10(abs(np.asarray(val)))).astype(int) + 2
{
 proj_info: {proj_id: cams84}
 exp_info: {
   exp_id: eval-trends
   exp_name: Trend evaluation of CAMS forecast and reanalysis models
   exp_descr: Both OSUITE and CNTRL are evaluated against multiple observation records including AOD from AERONET and PM, O3 and NO2 measurements.
   public: True
   exp_pi: Jan Griesfeller (jan.griesfeller@met.no)
  }
 time_cfg: {
   main_freq: monthly
   freqs:   list (1 items): ['monthly']
   add_seasons: True
   periods:   list (1 items): ['2013-2020']
  }
 modelmaps_opts: {maps_res_deg: 5}
 colocation_opts: {
   model_id: None
   obs_id: None
   obs_vars:   list (0 items): []
   ts_type: monthly
   start: None
   stop: None
   filter_name: WORLD-wMOUNTAINS
   basedir_coldata: /home/jang/MyPyaerocom/colocated_data
   save_coldata: True
   obs_name: None
   obs_data_dir: None
   obs_use_climatology: False
   _obs_cache_only: False
   obs_vert_type: None
   obs_ts_type_read: None
   obs_filters: {}
   read_opts_ungridded: {}
   model_name: None
   model_data_dir: None
   model_read_opts: {}
   model_use_vars: {}
   model_rename_vars: {}
   model_add_vars: {}
   model_to_stp: False
   model_ts_type_read: None
   model_read_aux: {}
   model_use_climatology: False
   gridded_reader_id: {
     model: ReadGridded
     obs: ReadGridded
    }
   flex_ts_type: True
   min_num_obs: {
     monthly: {daily: 21}
     daily: {hourly: 18}
    }
   resample_how: mean
   obs_remove_outliers: True
   model_remove_outliers: False
   obs_outlier_ranges: {}
   model_outlier_ranges: {}
   zeros_to_nan: False
   harmonise_units: True
   regrid_res_deg: None
   colocate_time: False
   reanalyse_existing: True
   raise_exceptions: False
   keep_data: False
   add_meta: {}
  }
 statistics_opts: {
   weighted_stats: True
   annual_stats_constrained: False
   add_trends: True
   trends_min_yrs: 7
   stats_tseries_base_freq: monthly
  }
 webdisp_opts: {
   regions_how: htap
   map_zoom: World
   add_model_maps: False
   modelorder_from_config: True
   obsorder_from_config: True
   var_order_menu:   list (0 items): []
   obs_order_menu:   list (0 items): []
   model_order_menu:   list (0 items): []
  }
 processing_opts: {
   clear_existing_json: False
   only_json: False
   only_colocation: False
   only_model_maps: False
  }
 obs_cfg: {AeronetL1.5-d: {
     obs_id: AeronetSunV3Lev1.5.daily
     obs_vars:     list (1 items): ['od550aer']
     obs_ts_type_read: None
     obs_vert_type: Column
     obs_aux_requires: {}
     instr_vert_loc: None
     is_superobs: False
     only_superobs: False
     read_opts_ungridded: {}
     obs_filters: {
       altitude:       list (2 items): [0, 1000]
       station_name: DRAGON*
       negate: station_name
      }
     min_num_obs: {monthly: {daily: 3}}
    }}
 model_cfg: {IFS-OSUITE: {
     model_id: ECMWF_OSUITE
     model_ts_type_read: 
     model_use_vars: {}
     model_add_vars: {}
     model_rename_vars: {}
     model_read_aux: {}
    }}
 var_web_info: {}
 path_manager: {
   proj_id: cams84
   exp_id: eval-trends
   json_basedir: /home/jang/data/aeroval-local-web/data
   coldata_basedir: /home/jang/data/aeroval-local-web/coldata
   coldata_basedir: /home/jang/data/aeroval-local-web/coldata
   json_basedir: /home/jang/data/aeroval-local-web/data
  }
 io_aux_file: /home/jang/data/aeroval-local-web/pyaerocom_config/eval_py/gridded_io_aux.py
 io_aux_file: /home/jang/data/aeroval-local-web/pyaerocom_config/eval_py/gridded_io_aux.py
}
Deleting everything under /home/jang/data/aeroval-local-web/data/cams84/eval-trends
Deleting everything under /home/jang/data/aeroval-local-web/coldata/cams84/eval-trends
no such experiment registered: eval-trends
Start processing
Deactivating file search by vertical code for ECMWF_OSUITE, since filenames do not include information about vertical code (probably AeroCom 2 convention)
Rearranging longitude dimension from 0 -> 360 definition to -180 -> 180 definition
Rearranging longitude dimension from 0 -> 360 definition to -180 -> 180 definition
Rearranging longitude dimension from 0 -> 360 definition to -180 -> 180 definition
Rearranging longitude dimension from 0 -> 360 definition to -180 -> 180 definition
Rearranging longitude dimension from 0 -> 360 definition to -180 -> 180 definition
Rearranging longitude dimension from 0 -> 360 definition to -180 -> 180 definition
Rearranging longitude dimension from 0 -> 360 definition to -180 -> 180 definition
Rearranging longitude dimension from 0 -> 360 definition to -180 -> 180 definition
The following variable combinations will be colocated
MODEL-VAR   OBS-VAR
od550aer    od550aer
Running ECMWF_OSUITE (od550aer) vs. AeronetSunV3Lev1.5.daily (od550aer)
Creating dir /home/jang/data/aeroval-local-web/coldata/cams84/eval-trends/IFS-OSUITE
WRITE: /home/jang/data/aeroval-local-web/coldata/cams84/eval-trends/IFS-OSUITE/od550aer_od550aer_MOD-IFS-OSUITE_REF-AeronetL1.5-d_20130101_20201231_monthly_WORLD-wMOUNTAINS.nc

Colocation processing status for IFS-OSUITE vs. AeronetL1.5-d
  Model Var   Obs Var   Status
0  od550aer  od550aer  SUCCESS
Processing: /home/jang/data/aeroval-local-web/coldata/cams84/eval-trends/IFS-OSUITE/od550aer_od550aer_MOD-IFS-OSUITE_REF-AeronetL1.5-d_20130101_20201231_monthly_WORLD-wMOUNTAINS.nc
Creating empty json file: /home/jang/data/aeroval-local-web/data/cams84/eval-trends/regions.json
Computing json files for IFS-OSUITE (od550aer) vs. AeronetL1.5-d (od550aer)
Processing statistics timeseries for all regions
Processing heatmap data for all regions
Traceback (most recent call last):
  File "/home/jang/data/aeroval-local-web/pyaerocom_config/config_files/cfg_cams84_eval_trends.py", line 125, in <module>
    res = ana.run()
  File "/home/jang/anaconda.2018/envs/pyaerocom-main-dev/lib/python3.9/site-packages/pyaerocom-0.12.0.dev2-py3.9.egg/pyaerocom/aeroval/experiment_processor.py", line 119, in run
    self._run_single_entry(model_name, obs_name, var_list)
  File "/home/jang/anaconda.2018/envs/pyaerocom-main-dev/lib/python3.9/site-packages/pyaerocom-0.12.0.dev2-py3.9.egg/pyaerocom/aeroval/experiment_processor.py", line 66, in _run_single_entry
    engine.run(files_to_convert)
  File "/home/jang/anaconda.2018/envs/pyaerocom-main-dev/lib/python3.9/site-packages/pyaerocom-0.12.0.dev2-py3.9.egg/pyaerocom/aeroval/coldatatojson_engine.py", line 38, in run
    self.process_coldata(coldata)
  File "/home/jang/anaconda.2018/envs/pyaerocom-main-dev/lib/python3.9/site-packages/pyaerocom-0.12.0.dev2-py3.9.egg/pyaerocom/aeroval/coldatatojson_engine.py", line 169, in process_coldata
    hm_all = _process_heatmap_data(data, regnames, use_weights,
  File "/home/jang/anaconda.2018/envs/pyaerocom-main-dev/lib/python3.9/site-packages/pyaerocom-0.12.0.dev2-py3.9.egg/pyaerocom/aeroval/coldatatojson_helpers.py", line 1065, in _process_heatmap_data
    subset_time_series = subset.get_regional_timeseries(regid)
  File "/home/jang/anaconda.2018/envs/pyaerocom-main-dev/lib/python3.9/site-packages/pyaerocom-0.12.0.dev2-py3.9.egg/pyaerocom/colocateddata.py", line 1796, in get_regional_timeseries
    result['obs'] = pd.Series(rgts.data[0], rgts.time)
  File "/home/jang/anaconda.2018/envs/pyaerocom-main-dev/lib/python3.9/site-packages/pandas/core/series.py", line 279, in __init__
    index = ensure_index(index)
  File "/home/jang/anaconda.2018/envs/pyaerocom-main-dev/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 5885, in ensure_index
    return Index(index_like, name=name, copy=copy)
  File "/home/jang/anaconda.2018/envs/pyaerocom-main-dev/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 295, in __new__
    return _maybe_asobject(dtype, DatetimeIndex, data, copy, name, **kwargs)
  File "/home/jang/anaconda.2018/envs/pyaerocom-main-dev/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 6171, in _maybe_asobject
    return klass(data, dtype=dtype, copy=copy, name=name, **kwargs)
  File "/home/jang/anaconda.2018/envs/pyaerocom-main-dev/lib/python3.9/site-packages/pandas/core/indexes/datetimes.py", line 307, in __new__
    dtarr = DatetimeArray._from_sequence_not_strict(
  File "/home/jang/anaconda.2018/envs/pyaerocom-main-dev/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py", line 326, in _from_sequence_not_strict
    subarr, tz, inferred_freq = sequence_to_dt64ns(
  File "/home/jang/anaconda.2018/envs/pyaerocom-main-dev/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py", line 2023, in sequence_to_dt64ns
    assert isinstance(result, np.ndarray), type(result)
AssertionError: <class 'xarray.core.dataarray.DataArray'>

@jgliss : any ideas?

jgliss commented 2 years ago

It looks like the same issue you had before, that input into pd.Series() is getting the "time" attr from the xarray.DataArray (which is a DataArray itself) , which it does not like. Don't think this is related to monthly and I am puzzled that this appears because I never had that issue.

Could it be that you have some old version of pandas or xarray that do not support to instantiate pandas Series instance with 1D DataArray instances, like in:

pd.Series(rgts.data[0], rgts.time)

?

jgriesfeller commented 2 years ago

solved by updating pandas to > 1.3.0

jgriesfeller commented 2 years ago

closing the issue