Open tomasvanoyen opened 1 year ago
Hi,
Sorry about the delayed response, this slipped through the cracks! But yes, there are some issues with the public GCP data because it takes the coordinate information of the first timestep of the year, and applies it to the whole year. Theoretically, the datasets of x_geostationary_coordinates
and y_geostationary_coordinates
should have the per-timestep coordinates, but the processing seems to have not worked, so they don't actually contain that data.
I will try to fix that processing so that the newer zarrs can have that fixed, although it might take quite a while. Primarily, we need to
`x_geostationary_coordinates
and y_geostationary_coordinates
with the actual valuesThat should allow the images to be shifted to the correct locations for plotting and the like. Sorry for the issue with that data.
Describe the bug
Apparently, I came across a discrepancy between the public (HRV) dataset on gcp and data directly downloaded from EUMETSAT api.
To Reproduce
Steps to reproduce the behavior:
gcs = gcsfs.GCSFileSystem()
zstore = 'gs://public-datasets-eumetsat-solar-forecasting/satellite/EUMETSAT/SEVIRI_RSS/v4/2020_hrv.zarr'
mapper = gcs.get_mapper(zstore)
ds = xr.open_zarr(mapper, consolidated=True)
and plot the data with coastlines:projection = { 'proj': 'geos', 'lon_0': 9.5, 'h': 35785831, 'x_0': 0, 'y_0': 0, 'a': 6378169, 'rf': 295.488065897014 }
fig = plt.figure(figsize=(20, 20))
crs = ccrs.Geostationary( central_longitude=projection['lon_0'], satellite_height=projection['h'], )
ax = plt.axes(projection=crs)
ax.coastlines(resolution='10m', alpha=0.5, color='blue')
ds['data'].sel(time=np.datetime64('2020-07-02T07:00:00'), variable='HRV').plot( ax=ax, cmap='gray', add_colorbar=False )
clearly shows that the coastlines are offset with the satelliet observation data (have a look at Libya).On the other hand, after downloading with the same data with eumdac cli (
eumdac download -c EO:EUM:DAT:MSG:MSG15-RSS --start 2020-07-02T06:45 --end 2020-07-02T07:15
) and combining the *.NAT files with the methods inscripts/extend_gcp_zarr.py
(temporary link here) removes the discrepancy between coastline and satellite observation.Hence, it appears something is incorrect about the satellite data in the public gc-bucket.
I am guessing here, but could it be that this is because the public zarr file lumps all information over 1 year together with moving spatial dimensions - during the year - of the observations? If this is the case, the data should be temporally divided over move zarr files.
Best regards,
Tomas