intake / intake-esm

An intake plugin for parsing an Earth System Model (ESM) catalog and loading assets into xarray datasets.
https://intake-esm.readthedocs.io
Apache License 2.0
135 stars 46 forks source link

manually changing dataframe for catalog #579

Closed jgiguereCC closed 1 year ago

jgiguereCC commented 1 year ago

Hi! I'm trying to manually change the dataframe for an esm-datastore and then assign the moditifed dataframe back to a catalog to read in CMIP6 models. I've tried using the functionality shown in the issue raised by @jbusecke here for intake-esm and the from_df() method showed here, but am getting AttributeError: can't set attribute and AttributeError: from_df from each of these methods respectively. Is there anything I can do to restrict the dataframe, then make a new catalog from that dataframe? I'm still quite new to using intake-esm, so apologies if this isn't the intended functionality!

intake-esm version:

intake_esm.show_versions()

INSTALLED VERSIONS
------------------

cftime: 1.6.2
dask: 2022.9.2
fastprogress: 0.2.7
fsspec: 2021.10.0
gcsfs: 2021.07.0
intake: 0.6.7
intake_esm: 2022.9.18
netCDF4: 1.6.2
pandas: 1.5.3
requests: 2.28.2
s3fs: 2022.8.2
xarray: 2022.9.0
zarr: 2.13.2

The Issue

import intake
import dask
url = "https://storage.googleapis.com/cmip6/pangeo-cmip6.json"
col = intake.open_esm_datastore(url)
scenarios = ["ssp370", "piControl", "historical"]  # set desired scenarios
var_name = 'tos'
time_step = ['Oday']
query = dict(experiment_id = scenarios,
             variable_id=var_name,
             table_id = time_step,
             member_id = 'r1i1p1f1'
            )
cat = col.search(require_all_on="source_id", **query)
correct_order = list(cat.df.columns)
new_df = cat.df.groupby(['source_id','experiment_id']).first().reset_index()[correct_order]
cat.df= new_df

Yields the error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[2], line 19
     17 correct_order = list(cat.df.columns)
     18 new_df = cat.df.groupby(['source_id','experiment_id']).first().reset_index()[correct_order]
---> 19 cat.df= new_df

AttributeError: can't set attribute

Thanks!

andersy005 commented 1 year ago

@jgiguereCC, thank you for putting together this reproducible issue :)

Try the following instead,

In [6]: cat.esmcat._df = new_df
jgiguereCC commented 1 year ago

that seems to work! thanks!