Closed joleenf closed 1 month ago
I'm really confused by that example code. The old Dataset didn't have anything to do with xarray, but this code is passing a DataArray apparently.
The example code is taken from the Scene documentation. I am also confused by the documentation code. Not certain if the "Dataset" referred to a Dataset from xarray, I used both
scn['my_dataset'] = Dataset(xr.DataArray, **my_info)
and
scn['my_dataset'] = xr.Dataset
with a dtype in the attrs among other things like the crs and a dataset name.
from satpy import Scene
new_scene = Scene()
# Load a Scene (not worrying about datetime for now...) from GOES-18 (west) and GOES-16 (east)
g16 = Scene(["clavrx_OR_ABI-L1b-RadC-M6C01_G16_s20231461636171.level2.nc"], reader="clavrx")
g18 = Scene(["clavrx_OR_ABI-L1b-RadC-M6C01_G18_s20231531721182.level2.nc"], reader="clavrx")
g16.load(["cloud_mask"])
g18.load(["cloud_mask"])
# Save the area definition of the GOES-18 (west) satellite for later.
area_def18 = g18["cloud_mask"].attrs["area"]
# These global attributes are saved so that it is easier to record the original date of the projection information used.
c18_attrs = g16["cloud_mask"].attrs
# Resample the GOES-16 data into the GOES-18 projection.
g16_ol_in_g18space = g16.resample(area_def18, resampler="nearest")
# Mask the region where there is overlap.
g16_ol_in_g18space = xr.where(g16_ol_in_g18space["cloud_mask"] >= 0, 1, 0)
# Create a new scene.
new_scene["overlap_mask"] = Dataset(data_vars={'overlap': (['y', 'x'], g16_ol_in_g18space.data)},
coords=g18["cloud_mask"].coords,
attrs=dict(name="overlap_mask", long_name="overlap_mask", platform_name="GOES-18",
start_time=c18_attrs["start_time"].isoformat(), flag_values=[0, 1],
flag_meaning="0: No Overlap, 1: Overlap",
dtype=g16_ol_in_g18space.data.dtype))
Creates a Scene and a dataset that looks kind of right?
new_scene["overlap_mask"]
Out[22]:
<xarray.Dataset> Size: 60MB
Dimensions: (y: 1500, x: 2500)
Coordinates:
latitude (y, x) float32 15MB dask.array<chunksize=(1500, 2500), meta=np.ndarray>
longitude (y, x) float32 15MB dask.array<chunksize=(1500, 2500), meta=np.ndarray>
crs object 8B PROJCRS["unknown",BASEGEOGCRS["unknown",DATUM["Unkno...
* y (y) float64 12kB 4.588e+06 4.586e+06 ... 1.586e+06 1.584e+06
* x (x) float64 20kB -2.504e+06 -2.502e+06 ... 2.502e+06 2.504e+06
Data variables:
overlap (y, x) int64 30MB dask.array<chunksize=(1500, 2500), meta=np.ndarray>
Attributes:
name: overlap_mask
long_name: overlap_mask
platform_name: GOES-18
start_time: 2023-05-26T16:36:17.100000
flag_values: [0, 1]
flag_meaning: 0: No Overlap, 1: Overlap
dtype: int64
_satpy_id: DataID(name='overlap_mask')
However,
new_scene.save_datasets(filename="test.nc")
raises an exception (I am assuming because I am using an xarray Dataset, rather than the Dataset that was intended in the documentation):
/Users/joleenf/Projects/polar2grid/satpy/satpy/writers/cf_writer.py:274: UserWarning: dtype int64 not compatible with CF-1.7.
grouped_datasets, header_attrs = collect_cf_datasets(list_dataarrays=datasets, # list of xr.DataArray
Traceback (most recent call last):
File "/Users/joleenf/miniconda3/envs/satpy-dev/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3577, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-20-22641c74dec4>", line 1, in <module>
new_scene.save_datasets(filename="test.nc")
File "/Users/joleenf/Projects/polar2grid/satpy/satpy/scene.py", line 1293, in save_datasets
return writer.save_datasets(dataarrays, compute=compute, **save_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/joleenf/Projects/polar2grid/satpy/satpy/writers/cf_writer.py", line 274, in save_datasets
grouped_datasets, header_attrs = collect_cf_datasets(list_dataarrays=datasets, # list of xr.DataArray
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/joleenf/Projects/polar2grid/satpy/satpy/cf/datasets.py", line 232, in collect_cf_datasets
ds = _collect_cf_dataset(
^^^^^^^^^^^^^^^^^^^^
File "/Users/joleenf/Projects/polar2grid/satpy/satpy/cf/datasets.py", line 137, in _collect_cf_dataset
new_dataarray = make_cf_data_array(new_dataarray,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/joleenf/Projects/polar2grid/satpy/satpy/cf/data_array.py", line 79, in make_cf_data_array
dataarray = _preprocess_data_array_name(dataarray=dataarray,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/joleenf/Projects/polar2grid/satpy/satpy/cf/data_array.py", line 49, in _preprocess_data_array_name
dataarray = dataarray.rename(new_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/joleenf/miniconda3/envs/satpy-dev/lib/python3.11/site-packages/xarray/core/dataset.py", line 4282, in rename
return self._rename(name_dict=name_dict, **names)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/joleenf/miniconda3/envs/satpy-dev/lib/python3.11/site-packages/xarray/core/dataset.py", line 4217, in _rename
name_dict = either_dict_or_kwargs(name_dict, names, "rename")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/joleenf/miniconda3/envs/satpy-dev/lib/python3.11/site-packages/xarray/namedarray/utils.py", line 193, in either_dict_or_kwargs
raise ValueError(f"the first argument to .{func_name} must be a dictionary")
ValueError: the first argument to .rename must be a dictionary```
So a Scene should only ever contain DataArray
objects except in I think a very small exception case (temporary container during MultiScene animation stuff). I mentioned on slack that there is a to_xarray
method on the Scene which should give you a Dataset object that is xarray-savable with .to_netcdf()
I think. That would be my first try for what you're doing, but regardless this documentation should be updated to use a DataArray and assigning to attrs=
for the metadata unless I'm missing something.
Sorry, I had missed the "to_xarray" mention. to_xarray()
works with to_netcdf()
for me.
import xarray as xr
from satpy import Scene
from xarray import DataArray
g16 = Scene(["clavrx_OR_ABI-L1b-RadC-M6C01_G16_s20231461636171.level2.nc"], reader="clavrx")
g18 = Scene(["clavrx_OR_ABI-L1b-RadC-M6C01_G18_s20231531721182.level2.nc"], reader="clavrx")
g16.load(["cloud_mask"])
g18.load(["cloud_mask"])
area_def18 = g18["cloud_mask"].attrs["area"]
g16_ol_in_g18space = g16.resample(area_def18, resampler="nearest")
g18mask = Scene()
g18mask["overlap"] = DataArray(data=xr.where(g16_ol_in_g18space["cloud_mask"] >= 0, 1, 0),
dims=['y', 'x'],
attrs=dict(name="overlap_mask", long_name="G18FixedGrid_overlap_G16",
platform_name="GOES-18",
start_time=g16_ol_in_g18space["cloud_mask"].attrs["start_time"], flag_values=[0, 1],
flag_meaning="0: No Overlap, 1: Overlap",
area=area_def18))
g18mask.save_dataset(filename="{long_name}.nc", dataset_id="overlap")
works. So maybe it is just clarifying the old documentation?
Yeah it is likely as simple as changing Dataset
to DataArray
and using attrs=
instead of **
on the info dictionary. Would you mind making a pull request?
The documentation in the Scene code indicates that a Scene object can be created without a filename or reader. The documentation refers to a Dataset, which looks like the nomenclature used to refer to an xarray Dataset. However, the "Dataset" in
Is referring to a Dataset object that preceded the use of xarray in SatPy, and the documentation itself should have been purged.