xcube-dev / xcube-sh

An xcube plugin to allow generating data cubes from the Sentinel Hub Cloud API
MIT License
13 stars 10 forks source link

Wrong dates for S2 in xcube-sh #60

Closed maximlamare closed 3 years ago

maximlamare commented 3 years ago

Maybe my approach is flawed, but xcube-sh seems to return wrong dates for Sentinel-2:

cube_config = CubeConfig(dataset_name='S2L2A',
                         band_names=['B03', 'B08', 'CLM'],
                         bbox=bounding_box_buffered.envelope.bounds,
                         crs=f"http://www.opengis.net/def/crs/EPSG/0/3857",
                         spatial_res=10,
                         time_range=['2020-06-01', '2020-06-30'],                         
                         )  

returns the following dates (which are incorrect, see last step in this issue):

array(['2020-06-03T06:32:14.000000000', '2020-06-08T06:32:18.000000000',
       '2020-06-13T06:32:14.000000000', '2020-06-18T06:32:18.000000000',
       '2020-06-23T06:32:15.000000000', '2020-06-28T06:32:17.000000000'],
      dtype='datetime64[ns]')

I was puzzled that this cube was full of nans so I ran:

cube_config = CubeConfig(dataset_name='S2L2A',
                         band_names=['B03', 'B08', 'CLM'],
                         bbox=bounding_box_buffered.envelope.bounds,
                         crs=f"http://www.opengis.net/def/crs/EPSG/0/3857",
                         spatial_res=10,
                         time_range=['2020-06-01', '2020-06-30'],
                         time_period="1D"
                         )  

which then obviously returns all days. Therefore I added the following command when opening the cube:

s2_cube = open_cube(cube_config)

# Drop empty time slices
s2_cube = s2_cube.dropna(dim="time")

which returns the correct dates (but after almost a minute of processing!)

array(['2020-06-01T12:00:00.000000000', '2020-06-04T12:00:00.000000000',
       '2020-06-06T12:00:00.000000000', '2020-06-09T12:00:00.000000000',
       '2020-06-11T12:00:00.000000000', '2020-06-14T12:00:00.000000000',
       '2020-06-16T12:00:00.000000000', '2020-06-19T12:00:00.000000000',
       '2020-06-21T12:00:00.000000000', '2020-06-24T12:00:00.000000000',
       '2020-06-26T12:00:00.000000000', '2020-06-29T12:00:00.000000000'],
      dtype='datetime64[ns]')

As you can see these correct dates are different from those returned by the first step.

AliceBalfanz commented 3 years ago

@maximlamare Hi Maxim, could you provide the bounding box you are using? This would be great :)

maximlamare commented 3 years ago

@AliceBalfanz with pleasure:

(2894267.8988124575, 9262943.968658403, 2899443.8488556934, 9268505.554239485) in EPSG 3857 ( crs=f"http://www.opengis.net/def/crs/EPSG/0/3857")

AliceBalfanz commented 3 years ago

It is quite tough to debug, so far I have discovered the following:

cube_config = CubeConfig(dataset_name='S2L2A',
                         band_names=['B03', 'B08', 'CLM'],
                         bbox=[2894267.8988124575, 9262943.968658403, 2899443.8488556934, 9268505.554239485],
                         crs=f"http://www.opengis.net/def/crs/EPSG/0/3857",
                         spatial_res=10,
                         time_range=('2020-06-01', '2020-06-30')
                         ) 

results in the following request to sentinelhub about the time stamps:

{'collections': ['sentinel-2-l2a'], 'limit': 100, 'fields': {'exclude': ['geometry', 'bbox', 'assets', 'links'], 'include': ['properties.datetime']}, 'bbox': (63.65600798545179, 25.99965089839723, 63.67816348259773, 26.046183630114623), 'datetime': '2020-06-01T00:00:00Z/2020-06-30T00:00:00Z'}

and returns a dataset, where all time stamps are empty (like @maximlamare observed). image

When requesting the same cube, but using WGS84:

cube_config = CubeConfig(dataset_name='S2L2A',
                         band_names=['B03', 'B08', 'CLM'],
                         bbox=[63.65600798545179, 25.99965089839723, 63.67816348259773, 26.046183630114623],
                         spatial_res=0.0001,
                         time_range=('2020-06-01', '2020-06-30') 
                        )  

sentinelhub timestamps request: {'collections': ['sentinel-2-l2a'], 'limit': 100, 'fields': {'exclude': ['geometry', 'bbox', 'assets', 'links'], 'include': ['properties.datetime']}, 'bbox': (63.65600798545179, 25.99965089839723, 63.67814798545179, 26.046270898397232), 'datetime': '2020-06-01T00:00:00Z/2020-06-30T00:00:00Z'}

results in the the datacube actually having values:

image


Then, requesting daily means:

cube_config = CubeConfig(dataset_name='S2L2A',
                         band_names=['B03', 'B08', 'CLM'],
                         bbox=[2894267.8988124575, 9262943.968658403, 2899443.8488556934, 9268505.554239485],
                         crs=f"http://www.opengis.net/def/crs/EPSG/0/3857",
                         spatial_res=10,
                         time_range=('2020-06-01', '2020-06-30'),
                         time_period='1D')  

does not request the time stamps from seninelhub. They are created by a funtion in xcube-sh chunkstore: https://github.com/dcs4cop/xcube-sh/blob/e8de4194ac55863fe6f61e0dc1e2fe3fc96d8cdf/xcube_sh/chunkstore.py#L218

results in a dataset of 12 time stamps with values: image

Again, doing the same for WGS84

cube_config = CubeConfig(dataset_name='S2L2A',
                         band_names=['B03', 'B08', 'CLM'],
                         bbox=[63.65600798545179, 25.99965089839723, 63.67816348259773, 26.046183630114623],
                         spatial_res=0.00018,
                         time_range=('2020-06-01', '2020-06-30'), 
                         time_period='1D'
                        )  

results in a dataset of 6 time stamps with values: image


Another observation: the time stamps of both WGS84 cubes are the same dates: Without time_period:

array(['2020-06-03T06:32:14.000000000', '2020-06-08T06:32:18.000000000',
       '2020-06-13T06:32:14.000000000', '2020-06-18T06:32:18.000000000',
       '2020-06-23T06:32:15.000000000', '2020-06-28T06:32:17.000000000'],
      dtype='datetime64[ns]')

with time_period='1D':

array(['2020-06-03T12:00:00.000000000', '2020-06-08T12:00:00.000000000',
       '2020-06-13T12:00:00.000000000', '2020-06-18T12:00:00.000000000',
       '2020-06-23T12:00:00.000000000', '2020-06-28T12:00:00.000000000'],
      dtype='datetime64[ns]')
AliceBalfanz commented 3 years ago

There is an issue with swapped coordinates when changing the crs from EPSG:3857 to EPSG:4326, need to clean up my notebook and will document here next week.

AliceBalfanz commented 3 years ago

https://github.com/dcs4cop/xcube-sh/blob/e8de4194ac55863fe6f61e0dc1e2fe3fc96d8cdf/xcube_sh/sentinelhub.py#L187

After further investigations the above line caused the trouble, as it expects the same order of x and y in both sources and target crs. However, in terms of x and y WGS84 is swapped, meaning (y/x):

https://www.w3.org/2015/spatial/wiki/Coordinate_Reference_Systems#EPSG_database

Here you can see the mapped bounding boxes from the investigation, at the very end you can find the whole notebook attached

First, the original bbox with the source CRS as provided by Maxim: image

image

image

Investigate_xcube-sh_issue60.pdf