insarlab / MintPy

Miami InSAR time-series software in Python
https://mintpy.readthedocs.io
Other
575 stars 252 forks source link

Add function to create CF-compliant arrays/metadata for HDF5 stacks #1073

Open scottstanie opened 1 year ago

scottstanie commented 1 year ago

Description of proposed changes

Start of the implementation described here: https://github.com/insarlab/MintPy/discussions/1016 to make mintpy HDF5 stacks readable by gdal/xarray/qgis.

I'll need to think of a way we can smoothly incorporate this into the prep_ scripts without interruption.

As a side note:

Maybe we could use "datetime" instead? Neither "date" nor "time" feels accurate, given that we handle both spaceborne and airborne data

For now I still have time as the possible attribute for the date/datetime stacks just because that's what the CF-conventions suggested. They also generally have dates/datetime, but use time as the standard variable name (even if you have daily granularity). Also, it currently has units=f"days since {str(date_arr[0])}" as the time units, but this can be seconds for intra-day stacks (e.g. for @taliboliver 's deltaX data). But i don't know what modifications he made; i've only seen that there's a date dataset usually.

Reminders

scottstanie commented 1 year ago

It's currently failing on one of the integration tests in the readfile.read_hdf5_file. It would be good to add a unit test for this, since the error is saying that data is never assigned. There's only checks for 2D and higher dimensions..

        # 2D dataset
        if ds.ndim == 2:
            ...
            data = ...

so it's likely one of the 1D or 0D datasets is tripping it up (but that should be caught separately in read_hdf5_file i think, since there shouldn't be a code path to an undefined var)

yunjunz commented 12 months ago

Thank you @scottstanie for this exciting PR!!!

I am trying to review this PR, but it does not seem easy to see what data and metadata have been added in the new format. Here are two questions for you:

  1. I assume the San Francisco Bay dataset from ARIA can be used to test this new capability, right?
  2. To document this new format, could you provide a minimal example for creating and examining the old/new format of the HDF5 file? I was relying on info.py, but it does not seem to work with the new format yet.

Also, it currently has units=f"days since {str(date_arr[0])}" as the time units, but this can be seconds for intra-day stacks (e.g. for @taliboliver 's deltaX data). But i don't know what modifications he made; i've only seen that there's a date dataset usually.

For the intra-day stacks, @taliboliver uses the YYYYMMDDTHHMM format in the date dataset, instead of the usual YYYYMMDD format, and modified the code throughout mintpy to automatically identify this difference while reading it.

I am not familiar enough with the new changes in this PR yet to help the choice here. Having the above two questions answered would help.