openclimatefix / Satip

Satip contains the code necessary for retrieving, transforming and storing EUMETSAT data
https://satip.readthedocs.io/
MIT License
41 stars 28 forks source link

discrepancies between (HRV) zarr files on gcp and downloaded satellite data #191

Open tomasvanoyen opened 1 year ago

tomasvanoyen commented 1 year ago

Describe the bug

Apparently, I came across a discrepancy between the public (HRV) dataset on gcp and data directly downloaded from EUMETSAT api.

To Reproduce

Steps to reproduce the behavior:

  1. Connect to public satellite data by:

gcs = gcsfs.GCSFileSystem() zstore = 'gs://public-datasets-eumetsat-solar-forecasting/satellite/EUMETSAT/SEVIRI_RSS/v4/2020_hrv.zarr' mapper = gcs.get_mapper(zstore) ds = xr.open_zarr(mapper, consolidated=True) and plot the data with coastlines:

projection = { 'proj': 'geos', 'lon_0': 9.5, 'h': 35785831, 'x_0': 0, 'y_0': 0, 'a': 6378169, 'rf': 295.488065897014 }

fig = plt.figure(figsize=(20, 20)) crs = ccrs.Geostationary( central_longitude=projection['lon_0'], satellite_height=projection['h'], ) ax = plt.axes(projection=crs) ax.coastlines(resolution='10m', alpha=0.5, color='blue')

ds['data'].sel(time=np.datetime64('2020-07-02T07:00:00'), variable='HRV').plot( ax=ax, cmap='gray', add_colorbar=False ) clearly shows that the coastlines are offset with the satelliet observation data (have a look at Libya).

On the other hand, after downloading with the same data with eumdac cli (eumdac download -c EO:EUM:DAT:MSG:MSG15-RSS --start 2020-07-02T06:45 --end 2020-07-02T07:15) and combining the *.NAT files with the methods in scripts/extend_gcp_zarr.py (temporary link here) removes the discrepancy between coastline and satellite observation.

Hence, it appears something is incorrect about the satellite data in the public gc-bucket.

I am guessing here, but could it be that this is because the public zarr file lumps all information over 1 year together with moving spatial dimensions - during the year - of the observations? If this is the case, the data should be temporally divided over move zarr files.

Best regards,

Tomas

jacobbieker commented 10 months ago

Hi,

Sorry about the delayed response, this slipped through the cracks! But yes, there are some issues with the public GCP data because it takes the coordinate information of the first timestep of the year, and applies it to the whole year. Theoretically, the datasets of x_geostationary_coordinates and y_geostationary_coordinates should have the per-timestep coordinates, but the processing seems to have not worked, so they don't actually contain that data.

I will try to fix that processing so that the newer zarrs can have that fixed, although it might take quite a while. Primarily, we need to

That should allow the images to be shifted to the correct locations for plotting and the like. Sorry for the issue with that data.