intake / intake-esm

An intake plugin for parsing an Earth System Model (ESM) catalog and loading assets into xarray datasets.
Apache License 2.0
135 stars 46 forks source link

manually changing dataframe for catalog #579

Closed jgiguereCC closed 1 year ago

jgiguereCC commented 1 year ago

Hi! I'm trying to manually change the dataframe for an esm-datastore and then assign the moditifed dataframe back to a catalog to read in CMIP6 models. I've tried using the functionality shown in the issue raised by @jbusecke here for intake-esm and the from_df() method showed here, but am getting AttributeError: can't set attribute and AttributeError: from_df from each of these methods respectively. Is there anything I can do to restrict the dataframe, then make a new catalog from that dataframe? I'm still quite new to using intake-esm, so apologies if this isn't the intended functionality!

intake-esm version:



cftime: 1.6.2
dask: 2022.9.2
fastprogress: 0.2.7
fsspec: 2021.10.0
gcsfs: 2021.07.0
intake: 0.6.7
intake_esm: 2022.9.18
netCDF4: 1.6.2
pandas: 1.5.3
requests: 2.28.2
s3fs: 2022.8.2
xarray: 2022.9.0
zarr: 2.13.2

The Issue

import intake
import dask
url = ""
col = intake.open_esm_datastore(url)
scenarios = ["ssp370", "piControl", "historical"]  # set desired scenarios
var_name = 'tos'
time_step = ['Oday']
query = dict(experiment_id = scenarios,
             table_id = time_step,
             member_id = 'r1i1p1f1'
cat ="source_id", **query)
correct_order = list(cat.df.columns)
new_df = cat.df.groupby(['source_id','experiment_id']).first().reset_index()[correct_order]
cat.df= new_df

Yields the error:

AttributeError                            Traceback (most recent call last)
Cell In[2], line 19
     17 correct_order = list(cat.df.columns)
     18 new_df = cat.df.groupby(['source_id','experiment_id']).first().reset_index()[correct_order]
---> 19 cat.df= new_df

AttributeError: can't set attribute


andersy005 commented 1 year ago

@jgiguereCC, thank you for putting together this reproducible issue :)

Try the following instead,

In [6]: cat.esmcat._df = new_df
jgiguereCC commented 1 year ago

that seems to work! thanks!