Closed np-complete-graph closed 3 years ago
Hi, thanks for using our dataset.
This is indeed unexpected. Do you think you could provide a code snippet for producing these numbers? Then I can go check.
Also, which spatial resolution are you using?
Hi Stephan
I'm using the 5.625° data. My bad, I forgot to select the pressure level when dong the difference. Here is my code snippet:
import xarray as xr
from dask.diagnostics import ProgressBar
datadir = {path to data}
t = xr.open_mfdataset("%s/temperature/*.nc" % datadir).sel(level=850)
t2 = xr.open_mfdataset("%s/temperature_850/*.nc" % datadir)
z = xr.open_mfdataset("%s/geopotential/*.nc" % datadir).sel(level=500)
z2 = xr.open_mfdataset("%s/geopotential_500/*.nc" % datadir)
tdiff = t-t2
zdiff = z-z2
delayed = tdiff.mean()
with ProgressBar():
tdiff_mean = delayed.compute()
delayed = tdiff.std()
with ProgressBar():
tdiff_std = delayed.compute()
print("Mean:", tdiff_mean.values(), "Std:", tdiff_std.values())
delayed = zdiff.mean()
with ProgressBar():
zdiff_mean = delayed.compute()
delayed = zdiff.std()
with ProgressBar():
zdiff_std = delayed.compute()
print("Mean:", zdiff_mean.values(), "Std:", zdiff_std.values())
print("Done.")
Deviations in temperature (mean, std):
Deviations in geopotential (mean, std):
Still the difference is minor but seems too high for me to be due to numerical inaccuracies, no?
Hmm, maybe it does have to do with numerical precision since I am only using single precision to save the arrays. I would probably tend to say that this is small enough to ignore. Do you agree?
It will most likely be negligible, I agree ;) Thanks for the discussion. 👍
I just noticed this too. It is probably ignorable but hard to say for sure unless someone has evaluated the same predictions on both targets and compared. Has anyone done this?
My suggestion would be to remove these extra single-level target fields, since they're redundant and including them leads to ambiguity about which of two different-by-a-rounding-error(?) fields should be used for evaluation. Or if you want them in there for convenience / to avoid reading the multi-level fields when evaluating, making them identical to the version from the multi-level field would be best.
First of all let me thank you for providing a superb basis to compare data driven methods on weather!
While working with the provided data I was confused by deviations in
temperature_850
andgeopotential_500
opposed to the values provided intemperature
at level 850 andgeopotential
at level 500. I expected that they would be equal. Why are the targets saved in a separate dataset folder? Can't one just use the data intemperature
andgeopotential
?Deviations in temperature (mean, std):
Deviations in geopotential (mean, std):
Thanks for your clarification.