openclimatefix / MetOfficeDataHub

Python wrapper around MetOffice Atmospheric Model Data REST API
MIT License
3 stars 0 forks source link

reduce size of live data #57

Open peterdudfield opened 1 year ago

peterdudfield commented 1 year ago

Detailed Description

Would be great to reduce the size of the NWP data. I think this could be done with a better compression.

Context

Possible Implementation

use Blosc Zstd compression

jacobbieker commented 1 year ago

I would try the new ocf_blosc2 Blosc2 ZSTD one, as that does give better results than the original Blosc Zstd

peterdudfield commented 1 year ago

Just so it super easy to do, do you a link to some code, where you save it using ocf_blosc2,

jacobbieker commented 1 year ago

Yeah, it would be something like this, which I use for the new satellite zarrs:

from ocf_blosc2.ocf_blosc2 import Blosc2

def write_to_zarr(dataset, zarr_name, mode, chunks):
    mode_extra_kwargs = {
        "a": {"append_dim": "time"},
        "w": {
            "encoding": {
                "data": {
                    "compressor": Blosc2("zstd", clevel=5),
                },
                "time": {"units": "nanoseconds since 1970-01-01"},
            }
        },
    }
    extra_kwargs = mode_extra_kwargs[mode]
    dataset.chunk(chunks).to_zarr(
        zarr_name, compute=True, **extra_kwargs, consolidated=True, mode=mode
    )
devsjc commented 1 year ago

Update deployed NWP consumer (to version 1.2.2):

devsjc commented 1 year ago

Updated forecaster to include OCF blosc 2 library (to version 1.3.11):

peterdudfield commented 1 year ago

great work @devsjc , what size did the NWP go down to using this method?

devsjc commented 1 year ago

Not quite there yet, I'll let you know!

devsjc commented 1 year ago

@peterdudfield which repository holds the national model? I will need to update that to be able to read the newly compressed NWP data as well.

devsjc commented 1 year ago

Seeing the following error in the forecaster in cloudwatch:

expected shape=(7, 24, 24, 11) actual shape (4, 24, 24, 11)
devsjc commented 1 year ago

Latest version does not seem to show the same error in cloudwatch

devsjc commented 1 year ago

Wrong shape error has occured again. @peterdudfield is this an expected error for the forecaster?

devsjc commented 1 year ago

NWP task is exiting with an out of memory error - must be the case that compression with Blosc2 takes more memory as that's the only change that has been implemented in that container. Increasing the memory: Dev: pr: https://github.com/openclimatefix/ocf-infrastructure/pull/251 tf: https://app.terraform.io/app/openclimatefix/workspaces/nowcasting_infrastructure_development-eu-west-1/runs/run-gkqyNt5GZj8vQpKX Prod:

peterdudfield commented 1 year ago

I just rolled back nwp to 1.2.0 from 1.2.2 on devlopment as it was causing an issue on development

https://eu-west-1.console.aws.amazon.com/cloudwatch/home?region=eu-west-1#logsV2:log-groups/log-group/$252Faws$252Fecs$252Fconsumer$252Fnwp$252F/log-events/streaming$252Fnwp-consumer$252F52032316e23c4ca5b31f8a4aa527cd4c

Screenshot 2023-05-27 at 08 01 49