opera-adt / COMPASS

COregistered Multi-temPorAl Sar Slc
Apache License 2.0
38 stars 18 forks source link

Correctly populate product fill values in geocoded datasets #167

Open LiangJYu opened 1 year ago

LiangJYu commented 1 year ago

This PR:

  1. correctly populates geocoded dataset fill values based on dataset data type
  2. renames init_geocoded_dataset to create_geocoded_dataset
  3. renames h5py.Dataset variables with grid prefix to data prefix
vbrancat commented 1 year ago

@LiangJYu do you mind update this branch? So we can go ahead and test it. Thanks.

LiangJYu commented 1 year ago

Unit tests pass with most current code on FN develop branch. Current failure is due to the change in types importing.

vbrancat commented 1 year ago

@LiangJYu do you mind merging the main branch into this PR branch?

LiangJYu commented 1 year ago

@LiangJYu, thanks for this PR. I am a bit lost. I see you implemented a functionality to retrieve the fill value for datasets with different data types. However, I don't see anywhere in the code that these fill values are plugged into the product metadata. Can you clarify?

Also, it would be great to come up with an automated way of understanding which fill value has been used for each data type instead of hard-coding them. If this is not possible, can we group all the fill value somewhere more visible?

All the changes have been merged away against main. :sob: I will restore and try to implement you suggestion above.

vbrancat commented 1 year ago

@LiangJYu can we quickly address these comments by EOD?

LiangJYu commented 1 year ago

@seongsujeong @vbrancat Comments have been addressed. Products look good from my testing with the Peru dataset. Please have a look. Thanks.

vbrancat commented 1 year ago

@LiangJYu I have processed a CSLC-S1 product using this PR and inspected it using HDF5. In the HDF5 viewer, I am not able to see the _FillValue as an attribute of 2D datasets. Can we insert as attribute for all the 2D datasets (timing corrections included)? Thanks.

Screenshot 2023-08-07 at 10 50 15 AM
LiangJYu commented 1 year ago

Looks good to me but for consistency, we should add fill values also to the timing corrections since these are 2D datasets. @seongsujeong, please, go ahead and review this PR as well.

Safe to assume NaN as fill value for all correction values? Apologies, I don't know them off the top of my head.

scottstanie commented 1 year ago

Don't wanna slow this down if you're about to try and get it merged, but if you wanted to cut down on some of the code, you might just be able to use the numpy dtype 'kind'


In [13]: dts = ['complex64', 'float64', 'float32', 'uint8']

In [14]: for dt in dts:
    ...:     print(dt, ':', np.dtype(dt).kind)
    ...:
complex64 : c
float64 : f
float32 : f
uint8 : u

that's how they do this in h5netcdf:

https://github.com/h5netcdf/h5netcdf/blob/2951cf8ea3174dde5663451484bde0cebaa535d0/h5netcdf/legacyapi.py#L24-L30 where they use a dict keyed on the numpy dtype kind

vbrancat commented 1 year ago

@seongsujeong can you go ahead and have another look at this PR? Thanks