pytroll / satpy

Python package for earth-observing satellite data processing
http://satpy.readthedocs.org/en/latest/
GNU General Public License v3.0
1.06k stars 292 forks source link

Confusing documentation for creating a Scene without a filename or reader #2836

Closed joleenf closed 1 month ago

joleenf commented 3 months ago

The documentation in the Scene code indicates that a Scene object can be created without a filename or reader. The documentation refers to a Dataset, which looks like the nomenclature used to refer to an xarray Dataset. However, the "Dataset" in

     scn = Scene()
     scn['my_dataset'] = Dataset(my_data_array, **my_info)

Is referring to a Dataset object that preceded the use of xarray in SatPy, and the documentation itself should have been purged.

djhoese commented 3 months ago

I'm really confused by that example code. The old Dataset didn't have anything to do with xarray, but this code is passing a DataArray apparently.

joleenf commented 3 months ago

The example code is taken from the Scene documentation. I am also confused by the documentation code. Not certain if the "Dataset" referred to a Dataset from xarray, I used both

scn['my_dataset'] = Dataset(xr.DataArray, **my_info)

and scn['my_dataset'] = xr.Dataset

with a dtype in the attrs among other things like the crs and a dataset name.

from satpy import Scene
new_scene = Scene()

# Load a Scene (not worrying about datetime for now...) from GOES-18 (west) and GOES-16 (east)
g16 = Scene(["clavrx_OR_ABI-L1b-RadC-M6C01_G16_s20231461636171.level2.nc"], reader="clavrx")
g18 = Scene(["clavrx_OR_ABI-L1b-RadC-M6C01_G18_s20231531721182.level2.nc"], reader="clavrx")

g16.load(["cloud_mask"])
g18.load(["cloud_mask"])

#  Save the area definition of the GOES-18 (west) satellite for later.
area_def18 = g18["cloud_mask"].attrs["area"]

# These global attributes are saved so that it is easier to record the original date of the projection information used.
c18_attrs = g16["cloud_mask"].attrs

# Resample the GOES-16 data into the GOES-18 projection.
g16_ol_in_g18space = g16.resample(area_def18, resampler="nearest")

# Mask the region where there is overlap.
g16_ol_in_g18space = xr.where(g16_ol_in_g18space["cloud_mask"] >= 0, 1, 0)

# Create a new scene.
new_scene["overlap_mask"] = Dataset(data_vars={'overlap': (['y', 'x'], g16_ol_in_g18space.data)},
                           coords=g18["cloud_mask"].coords,
                           attrs=dict(name="overlap_mask", long_name="overlap_mask", platform_name="GOES-18",
                                      start_time=c18_attrs["start_time"].isoformat(), flag_values=[0, 1],
                                      flag_meaning="0: No Overlap, 1: Overlap",
                                      dtype=g16_ol_in_g18space.data.dtype))

Creates a Scene and a dataset that looks kind of right?

new_scene["overlap_mask"]
Out[22]: 
<xarray.Dataset> Size: 60MB
Dimensions:    (y: 1500, x: 2500)
Coordinates:
    latitude   (y, x) float32 15MB dask.array<chunksize=(1500, 2500), meta=np.ndarray>
    longitude  (y, x) float32 15MB dask.array<chunksize=(1500, 2500), meta=np.ndarray>
    crs        object 8B PROJCRS["unknown",BASEGEOGCRS["unknown",DATUM["Unkno...
  * y          (y) float64 12kB 4.588e+06 4.586e+06 ... 1.586e+06 1.584e+06
  * x          (x) float64 20kB -2.504e+06 -2.502e+06 ... 2.502e+06 2.504e+06
Data variables:
    overlap    (y, x) int64 30MB dask.array<chunksize=(1500, 2500), meta=np.ndarray>
Attributes:
    name:           overlap_mask
    long_name:      overlap_mask
    platform_name:  GOES-18
    start_time:     2023-05-26T16:36:17.100000
    flag_values:    [0, 1]
    flag_meaning:   0: No Overlap, 1: Overlap
    dtype:          int64
    _satpy_id:      DataID(name='overlap_mask')

However, new_scene.save_datasets(filename="test.nc")

raises an exception (I am assuming because I am using an xarray Dataset, rather than the Dataset that was intended in the documentation):


/Users/joleenf/Projects/polar2grid/satpy/satpy/writers/cf_writer.py:274: UserWarning: dtype int64 not compatible with CF-1.7.
  grouped_datasets, header_attrs = collect_cf_datasets(list_dataarrays=datasets,  # list of xr.DataArray
Traceback (most recent call last):
  File "/Users/joleenf/miniconda3/envs/satpy-dev/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3577, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-20-22641c74dec4>", line 1, in <module>
    new_scene.save_datasets(filename="test.nc")
  File "/Users/joleenf/Projects/polar2grid/satpy/satpy/scene.py", line 1293, in save_datasets
    return writer.save_datasets(dataarrays, compute=compute, **save_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/joleenf/Projects/polar2grid/satpy/satpy/writers/cf_writer.py", line 274, in save_datasets
    grouped_datasets, header_attrs = collect_cf_datasets(list_dataarrays=datasets,  # list of xr.DataArray
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/joleenf/Projects/polar2grid/satpy/satpy/cf/datasets.py", line 232, in collect_cf_datasets
    ds = _collect_cf_dataset(
         ^^^^^^^^^^^^^^^^^^^^
  File "/Users/joleenf/Projects/polar2grid/satpy/satpy/cf/datasets.py", line 137, in _collect_cf_dataset
    new_dataarray = make_cf_data_array(new_dataarray,
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/joleenf/Projects/polar2grid/satpy/satpy/cf/data_array.py", line 79, in make_cf_data_array
    dataarray = _preprocess_data_array_name(dataarray=dataarray,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/joleenf/Projects/polar2grid/satpy/satpy/cf/data_array.py", line 49, in _preprocess_data_array_name
    dataarray = dataarray.rename(new_name)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/joleenf/miniconda3/envs/satpy-dev/lib/python3.11/site-packages/xarray/core/dataset.py", line 4282, in rename
    return self._rename(name_dict=name_dict, **names)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/joleenf/miniconda3/envs/satpy-dev/lib/python3.11/site-packages/xarray/core/dataset.py", line 4217, in _rename
    name_dict = either_dict_or_kwargs(name_dict, names, "rename")
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/joleenf/miniconda3/envs/satpy-dev/lib/python3.11/site-packages/xarray/namedarray/utils.py", line 193, in either_dict_or_kwargs
    raise ValueError(f"the first argument to .{func_name} must be a dictionary")
ValueError: the first argument to .rename must be a dictionary```
djhoese commented 3 months ago

So a Scene should only ever contain DataArray objects except in I think a very small exception case (temporary container during MultiScene animation stuff). I mentioned on slack that there is a to_xarray method on the Scene which should give you a Dataset object that is xarray-savable with .to_netcdf() I think. That would be my first try for what you're doing, but regardless this documentation should be updated to use a DataArray and assigning to attrs= for the metadata unless I'm missing something.

joleenf commented 3 months ago

Sorry, I had missed the "to_xarray" mention. to_xarray() works with to_netcdf() for me.

joleenf commented 3 months ago
import xarray as xr

from satpy import Scene
from xarray import DataArray

g16 = Scene(["clavrx_OR_ABI-L1b-RadC-M6C01_G16_s20231461636171.level2.nc"], reader="clavrx")
g18 = Scene(["clavrx_OR_ABI-L1b-RadC-M6C01_G18_s20231531721182.level2.nc"], reader="clavrx")

g16.load(["cloud_mask"])
g18.load(["cloud_mask"])

area_def18 = g18["cloud_mask"].attrs["area"]

g16_ol_in_g18space = g16.resample(area_def18, resampler="nearest")

g18mask = Scene()
g18mask["overlap"] = DataArray(data=xr.where(g16_ol_in_g18space["cloud_mask"] >= 0, 1, 0),
                               dims=['y', 'x'],
                               attrs=dict(name="overlap_mask", long_name="G18FixedGrid_overlap_G16",
                                          platform_name="GOES-18",
                                          start_time=g16_ol_in_g18space["cloud_mask"].attrs["start_time"], flag_values=[0, 1],
                                          flag_meaning="0: No Overlap, 1: Overlap",
                                          area=area_def18))

g18mask.save_dataset(filename="{long_name}.nc", dataset_id="overlap")

works. So maybe it is just clarifying the old documentation?

djhoese commented 3 months ago

Yeah it is likely as simple as changing Dataset to DataArray and using attrs= instead of ** on the info dictionary. Would you mind making a pull request?