podaac / net2cog

Harmony service for transforming netCDF file to cloud optimized geotiffs
Apache License 2.0
3 stars 3 forks source link

Update service to support MUR collection #8

Open frankinspace opened 1 year ago

frankinspace commented 1 year ago

Add support for MUR-JPL-L4-GLOB-v4.1 collection and provide sample output to DAS team

frankinspace commented 1 year ago

First attempt at converting a MUR image resulted in an error that still needs investigating. This is the error:

test_netcdf_convert.py:42: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../net2cog/netcdf_convert.py:188: in netcdf_converter
    return _write_cogtiff(gtiff_fname, xds)
../net2cog/netcdf_convert.py:110: in _write_cogtiff
    nc_xarray[var].rio.to_raster(temp_fname)
../../../../Library/Caches/pypoetry/virtualenvs/net2cog-MeQz-rfm-py3.10/lib/python3.10/site-packages/rioxarray/raster_array.py:1116: in to_raster
    self.encoded_nodata if self.encoded_nodata is not None else self.nodata
../../../../Library/Caches/pypoetry/virtualenvs/net2cog-MeQz-rfm-py3.10/lib/python3.10/site-packages/rioxarray/raster_array.py:330: in encoded_nodata
    return _ensure_nodata_dtype(encoded_nodata, self._obj.dtype)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

original_nodata = -128.0, new_dtype = dtype('<m8[ns]')

    def _ensure_nodata_dtype(original_nodata, new_dtype):
        """
        Convert the nodata to the new datatype and raise warning
        if the value of the nodata value changed.
        """
        # Complex-valued rasters can have real-valued nodata
        if str(new_dtype).startswith("c"):
            nodata = original_nodata
        else:
            original_nodata = float(original_nodata)
>           nodata = numpy.dtype(new_dtype).type(original_nodata)
E           ValueError: Could not convert object to NumPy timedelta

../../../../Library/Caches/pypoetry/virtualenvs/net2cog-MeQz-rfm-py3.10/lib/python3.10/site-packages/rioxarray/raster_writer.py:128: ValueError
viviant100 commented 11 months ago

@voxparcxls was able to replicate the same error shown above. Also testing with Measure data.

viviant100 commented 10 months ago

MEaSUREs data ssh_grids_v2205_1992101012.nc was converted successfully per @voxparcxls

autydp commented 10 months ago

Re. the error shown above -

I don't have a work-around or quick fix for this. The RioXArray code seems a bit naive in its attempt to establish "proper" no-data values - it does not work for time/date/time-delta data, and it leaves me a bit concerned about what other NetCDF/CF data might cause problems. This is perhaps overstating the problem, but it is a concern.

autydp commented 10 months ago

A possible fix to RioXArray might be to revise the _ensure_nodata_dtype method (see comment above with failing implementation):

def _ensure_nodata_dtype(original_nodata, new_dtype):
    """
    Convert the nodata to the new datatype and raise warning
    if the value of the nodata value changed.
    """
    # Complex-valued rasters can have real-valued nodata
    new_dtype_str = str(new_dtype)      ###v
    if new_dtype_str.startswith("c") \
         or "date" in new_dtype_str  \
         or "time" in new_dtype_str:    ###^
             nodata = original_nodata
    else:
        original_nodata = float(original_nodata)
        nodata = numpy.dtype(new_dtype).type(original_nodata)
        if not numpy.isnan(nodata) and original_nodata != nodata:
            warnings.warn(
                f"The nodata value ({original_nodata}) has been automatically "
                f"changed to ({nodata}) to match the dtype of the data."
            )
    return nodata

This may not be sufficient for all cases, but gets past the issues with date, time and timedelta data. Also, it is not clear this is sufficient for all downstream uses of the RioXArray data object for date, time and timedelta data, but works for Net2Cog.

voxparcxls commented 10 months ago

This code change in rioxarray does work. So this seems like a shorter work around than reprojecting data, though like you said it might not be for every case.

BUT the .tif has issues, the map gets blocked out. Maybe some incompatibility..

Patch for net2cog: https://github.com/podaac/net2cog/blob/rioxarray_netcdf_patch/net2cog/rioxarray_raster_patch.diff

Here's Cog generation from a couple of files. Previously errored for MUR netcdf generates TIF.

MUR & other Collection examples
Screenshot 2023-10-19 at 11 40 09 AM
20230917090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1_analysed_sst.tif
Screenshot 2023-10-19 at 11 49 32 AM
flamingbear commented 10 months ago

I wonder if that is just a problem displaying the tiff file? I have one that looks like that in mac's preview, but opens fine in QGIS.

Even simply running file on it shows issues, like a negative width:

mur_analysed_sst.tif: TIFF image data, little-endian, direntries=19, height=17999, bps=16, compression=deflate, PhotometricIntepretation=BlackIsZero, width=-29536
autydp commented 10 months ago

Like the inverted image, this is an issue with using a basic tiff viewer (MacOS Preview) on these geotiff files, and not evident in GIS viewers (QGIS). I agree, however, that something is fishy here. I'm suspicious that there is a mask layer or setting with improper values (fill?), or something like that. And curious if it is specific to COG settings, as I don't see this on plain gdal_translate to GTiff. ~So maybe that no-data value is not that good a solution.~. It is independent of the no-data fix above as it appears on all of the outputs, not just dt_1km_data.

voxparcxls commented 10 months ago

Okay, it looks fine when opened with Safari or mac photos. Haven't tried QGIS

The cog generation was a selection of all variables below:

Screenshot 2023-10-19 at 3 26 23 PM

I'll check for the band selection and other defaults in the net-cog process, see whats up.