Open ThomasHaine opened 3 months ago
Hey @ThomasHaine - confusing. Some quick questions to help me understand:
mamba list
Intake
and intake-xarray
are both needed .For example, in my Oceanography env I have Intake v2.0.3
and intake-xarray 0.7.0
.
Thanks @Mikejmnez. Good point about the environment. It wasn't properly installed. To fix it, I ran:
conda env create -f oceanspy/sciserver_catalogs/environment.yml
conda activate Oceanography
pip install ipykernel
ipython kernel install --user --name=Oceanography-ceph
Then conda info --envs
gives:
# conda environments:
#
base /home/idies/mambaforge
Oceanography * /home/idies/mambaforge/envs/Oceanography
py39 /home/idies/mambaforge/envs/py39
Now I select the Oceanography-ceph
kernel for my notebook. It still errors with:
FileNotFoundError: [Errno 2] No such file or directory: '/home/idies/workspace/OceanCirculation/exp_ASR/grid.nc'
This confuses me because this path has been replaced in sciserver_catalog/catalog_xarray.yaml
.
OK, some progress: The .yaml
catalogs are hard-coded in open_oceandataset.py
and by default read the main stable release. Override the default like this:
catalog_url = (
"https://raw.githubusercontent.com/ThomasHaine/oceanspy/"
"ceph-dev/sciserver_catalogs/catalog_xarray.yaml"
)
od = ospy.open_oceandataset.from_catalog("get_started",catalog_url)
Now it's reading the ceph directory,
Just catching up. That makes sense. Another alternative is to create your own yaml
catalog with catalog_url
and use that. I usually go around this way since there is no need to undo the changes to oceanspy
. Just make sure to reverse the change when you're ready to push onto main branch (PR).
Sounds good. Do you suggest I create (e.g.) catalog_xarray-ceph.yaml
and catalog_xmitgcm-ceph.yaml
and a new sciserver-ceph
dataset
in datasets_list.yaml
? Then we can add the new data sources in open_oceandataset.py
(I might need some help with this bit!).
No, I think the way you were doing it was appropriate. You are essentially migrating the data to ceph and that requires updating the access pattern. Once you push your changes to a new PR and before merging, we should restore how open_oceandataset.py
reads from main. That is, replace ceph-dev
with main
below
catalog_url = ( "https://raw.githubusercontent.com/ThomasHaine/oceanspy/" "ceph-dev/sciserver_catalogs/catalog_xarray.yaml" )
Were you able to read the datasets from ceph?
Sounds good. But we should maintain the original (filedb) functionality too, at least for a while. What's the easiest way to keep both access methods functional at the same time?
Yes, I can read the datasets from ceph. I've copied several (no LLC4320 or DYAMOND yet), and will test in the next few days.
Actually, I can't read all the datasets. For IGPwinter
, EGshelfIIseas2km_ASR_{crop,full}
, and EGshelfIIseas2km_ERAI_{6H,1D}
I get this error:
Opening EGshelfIIseas2km_ERAI_1D.
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[9], line 23
6 catalog_url2 = (
7 "https://raw.githubusercontent.com/ThomasHaine/oceanspy/"
8 "ceph-dev/sciserver_catalogs/catalog_xmitgcm.yaml"
9 )
11 # od = ospy.open_oceandataset.from_catalog("EGshelfIIseas2km_ASR_full",catalog_url1)
12 # print(od.dataset)
13 # print('\n')
(...)
20 # print(od.dataset)
21 # print('\n')
---> 23 od = ospy.open_oceandataset.from_catalog("EGshelfIIseas2km_ERAI_1D",catalog_url1)
24 print(od.dataset)
25 print('\n')
File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/oceanspy/open_oceandataset.py:141, in from_catalog(name, catalog_url)
138 mtdt = cat[entry].metadata
140 # Create ds
--> 141 ds = cat[entry].to_dask()
142 else:
143 # Pop args and metadata
144 args = cat[entry].pop("args")
File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/intake_xarray/base.py:69, in DataSourceMixin.to_dask(self)
67 def to_dask(self):
68 """Return xarray object where variables are dask arrays"""
---> 69 return self.read_chunked()
File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/intake_xarray/base.py:44, in DataSourceMixin.read_chunked(self)
42 def read_chunked(self):
43 """Return xarray object (which will have chunks)"""
---> 44 self._load_metadata()
45 return self._ds
File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/intake/source/base.py:84, in DataSourceBase._load_metadata(self)
82 """load metadata only if needed"""
83 if self._schema is None:
---> 84 self._schema = self._get_schema()
85 self.dtype = self._schema.dtype
86 self.shape = self._schema.shape
File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/intake_xarray/base.py:18, in DataSourceMixin._get_schema(self)
15 self.urlpath = self._get_cache(self.urlpath)[0]
17 if self._ds is None:
---> 18 self._open_dataset()
20 metadata = {
21 'dims': dict(self._ds.dims),
22 'data_vars': {k: list(self._ds[k].coords)
23 for k in self._ds.data_vars.keys()},
24 'coords': tuple(self._ds.coords.keys()),
25 }
26 if getattr(self, 'on_server', False):
File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/intake_xarray/netcdf.py:73, in NetCDFSource._open_dataset(self)
71 if "*" in url or isinstance(url, list):
72 _open_dataset = xr.open_mfdataset
---> 73 if self.pattern:
74 kwargs.update(preprocess=self._add_path_to_ds)
75 if self.combine is not None:
AttributeError: 'NetCDFSource' object has no attribute 'pattern'
Any ideas what's going on?
Folks, especially @Mikejmnez , I'm trying to get
oceanspy
to load the new datasets from SciServer-ceph. I've:sciserver_catalogs/catalog_xarry.yaml
.intake
.I'm confused because
netCDF4
is installed. Any ideas on how to fix/what to do next?