Closed aaronspring closed 3 years ago
That's good question and different people will give different answers; we probably don't have the time for a survey. There are people in data compression at NCAR who are surveying this.
I know Tim Palmers group works on this topic but even for running the model.
I would argue for our skill calculations mostly only the first 2, 3 digits matter.
I agree. Let's use the first 2 or 3 digits and we can revisit if someone complains. If you agree, I will close this issue.
Let’s keep this open until we have zarr store for upload to the cloud.
Testing in pandas data frame: https://www.kaggle.com/arjanso/reducing-dataframe-memory-size-by-65
did a check for the S2S competition with RPS. float32 single precision is fine. half precision float16 leads to large errors in dry regions: https://renkulab.io/gitlab/aaron.spring/s2s-ai-competition-bootstrap/-/blob/master/notebooks/verification_RPSS-precision.ipynb
Often climate model output comes with endless digits. Quick check the SubX defaults, what would be needed for predictability and whether we can reduce the size