openclimatefix / ocf_datapipes

OCF's DataPipe based dataloader for training and inference
MIT License
13 stars 11 forks source link

SLow loadin satellite data #132

Closed peterdudfield closed 1 year ago

peterdudfield commented 1 year ago

Describe the bug Loading one example fo the satellite is slow

To Reproduce

from ocf_datapipes.load import OpenSatellite
import time

t0 = time.time()
satellite_datapipe = OpenSatellite(
    zarr_path="gs://solar-pv-nowcasting-data/satellite/EUMETSAT/SEVIRI_RSS/v3/eumetsat_seviri_uk.zarr"
)

satellite_xr = next(iter(satellite_datapipe))

selected = satellite_xr.isel(
    {
        "time_utc": slice(10, 45),
        "x_geostationary": slice(100, 150),
        "y_geostationary": slice(100, 150),
    }
)
print(time.time() - t0)
selected_sum = selected.sum()  # this takes 0.01 seconds to load
print(time.time() - t0)
selected = selected.values  # this takes 5 seconds to load
print(selected)
print(time.time() - t0)
print(f"{selected.nbytes/10**6} MB")

Expected behavior Should happen in < 1 seconds

Additional context Chunk size is

(33, 33, 33, ..., ..., ..., 33, 33), (11,), (298,), (615,))
peterdudfield commented 1 year ago

Suggestion from @jacobbieker is to satellite_xr.rechunk({'time': 12, 'variable': 11, 'x': 298, 'y': 615}) and resave it

jacobbieker commented 1 year ago

What if you try that on Leonardo? Or Donatello? Does it still take that much longer to load than for sum?

peterdudfield commented 1 year ago

whats the location on Leonardo?

peterdudfield commented 1 year ago

/mnt/storage_ssd_8tb/data/ocf/solar_pv_nowcasting/nowcasting_dataset_pipeline/satellite/EUMETSAT/SEVIRI_RSS/zarr/v3/eumetsat_seviri_uk.zarr

jacobbieker commented 1 year ago

/mnt/storage_ssd_4tb/metnet_train/eumetsat_seviri_uk.zarr on Donatello, mnt/storage_ssd_8tb/data/ocf/solar_pv_nowcasting/nowcasting_dataset_pipeline/satellite/EUMETSAT/SEVIRI_RSS/zarr/v3/eumetsat_seviri_uk.zarr on leonardo

peterdudfield commented 1 year ago

tried just using xr.open_zarr(zarr_path="gs://solar-pv-nowcasting-data/satellite/EUMETSAT/SEVIRI_RSS/v3/eumetsat_seviri_uk.zarr") to isolate the problem, this gets it down to more like 1 second

peterdudfield commented 1 year ago

What if you try that on Leonardo? Or Donatello? Does it still take that much longer to load than for sum?

It also took 5 seconds on Leonardo for OpenSatellite

simlmx commented 1 year ago

tried just using xr.open_zarr(zarr_path="gs://solar-pv-nowcasting-data/satellite/EUMETSAT/SEVIRI_RSS/v3/eumetsat_seviri_uk.zarr") to isolate the problem, this gets it down to more like 1 second

It will probably worth checking what slows down OpenSatelliteIterDataPipe, but 1 second still feels way too slow.

peterdudfield commented 1 year ago

tried just using xr.open_zarr(zarr_path="gs://solar-pv-nowcasting-data/satellite/EUMETSAT/SEVIRI_RSS/v3/eumetsat_seviri_uk.zarr") to isolate the problem, this gets it down to more like 1 second

It will probably worth checking what slows down OpenSatelliteIterDataPipe, but 1 second still feels way too slow.

0.7 seconds, still seems slow, but better than 5

peterdudfield commented 1 year ago

Yea, 0.7 seconds is not very good for loading one example, as one batch for example will be 32, and that'll add up quickly, to 22.4 seconds per batch

peterdudfield commented 1 year ago

There seems to be a difference with

dataset = xr.open_zarr(store=zarr_path)

and

dataset = xr.open_dataset(leonardo_path, engine="zarr", chunks="auto")

The first is much quicker

peterdudfield commented 1 year ago

I'll pay a PR, as this feels a big enough different and is slowly down training. Further speed ups im sure can be made

jacobbieker commented 1 year ago

I've been creating various test zarrs, all of 3 days, or 864 timesteps, on leonardo under /mnt/storage_c/ going through options of chunking along time, size along spatial chunks, type of compression, number of channels included, and more. So we should have a better idea soon which one works best for us. They are all being created from the JPEG-XL compressed single timesteps saved on leonardo, not the raw files, to make it quicker to create them right now.

peterdudfield commented 1 year ago

That sounds great, yea we should should see what size they are, and maybe have a tiny bit of code, to see how long one example takes to load. Thanks for doing this @jacobbieker

jacobbieker commented 1 year ago

Yeah, I am getting the sizes now, and was planning on copying the code above with a few changes to see how it affects loading times. But just to compare it to the current ones.

jacobbieker commented 1 year ago

For sizes so far, compressing HRV files, the clear winner is JPEG-XL, its beating out bz2, zstd, and zlib by quite a lot, without needing to make the data into ints. It still is being compared to BitRound, Quantize, and ZFPY compressions too though, which are all somewhat lossy as well, like JPEG-XL. But, compared to the losses ones, for 3 days (864 timesteps) of data, the JPEG-XL takes 818MB of space, while BZ2 takes 39GB, so its only 2.1% the size of the worst compression. That would put 1 year of HRV data at ~100GB with JPEG-XL vs 4.75TB, and the whole 2014-2022 dataset at around 38TB for BZ2 vs 800GB for JPEG-XL. Still need to see how the 11 other non-HRV channels differ, and more chunk sizes, but size-wise it is a lot smaller. The best compression in size other than JPEG-XL so far is bz2 with the data as int8s, and 12 timesteps per chunk, at 2 GB for the 3 days of data, or twice the size of JPEG-XL.

peterdudfield commented 1 year ago
Compressions Size Type Speed
Current TODO TODO TODO
JPEG-XL 0.8 GB TODO TODO
BZ2 2 GB int8 TODO

Worth filling this out @jacobbieker

jacobbieker commented 1 year ago
Compressions Size Type Speed
Current TODO TODO TODO
JPEG-XL 0.8 GB float32 TODO
BZ2 2 GB int8 TODO

Worth filling this out @jacobbieker

Thanks, I'll fill it out as I go

jacobbieker commented 1 year ago

In looking at https://github.com/observingClouds/xbitinfo for the amount of "real information" in the satellite imagery. For keeping 99% of the information, we can BitRound down to 7 bits, or 99.9%/99.99% by going up to 11 bits. I believe 99% is probably good enough, and I'll try that out

jacobbieker commented 1 year ago

The fastest 10 loading times from leonardo onto Donatello for 378 combinations of HRV data. If Bitround is 0.0 it means there is no rounding done. Precision of 8 means int 8, while 16 is in half-precision and 32 is full prevision.

For the tradeoff between space and size, the two winners seem to be JPEG-XL, or ZFP with timestep chunks of 4. ZFP is easier to use and doesn't have the issues that JPEG-XL is slightly harder to install support for (i.e. can't seem to install the library on Leonardo, as its not in the Ubuntu or Debian repos) vs ZFP. ZFP does require 4x4x4 chunks though, so is very inefficient for chunks that are not multiples of 4, and if the chunks are not multiples of 4 it pads each chunk to be a multiple of 4.

Note on the bitrounding, for the satellite data testing, 9-10 bits keeps 99% of the data and 11 bits keeps 99.99% of the data, so the JPEG-XL version keeps slightly more data than the zfp or zstd compressions.

Speed: 0.15328574180603027 sec 
 Size: 2.4G 
 Precision: 8 
Algo: zstd 
 Bitround: 0.0  
 Timestep chunk: 12 
Effort: 8 
 Y chunk: 1044 X chunk: 1392 

Speed: 0.18117666244506836 sec 
 Size: 2.5G 
 Precision: 8 
Algo: zstd 
 Bitround: 0.0  
 Timestep chunk: 12 
Effort: 4 
 Y chunk: 1044 X chunk: 1392 

Speed: 0.18256306648254395 sec 
 Size: 51M 
 Precision: 32 
Algo: jpeg-xl 
 Bitround: 11.0  
 Timestep chunk: 1 
Effort: 8 
 Y chunk: 1044 X chunk: 1392 

Speed: 0.19475793838500977 sec 
 Size: 51M 
 Precision: 16 
Algo: zfp 
 Bitround: 10.0  
 Timestep chunk: 4 
Effort: 8 
 Y chunk: 1044 X chunk: 1392 

Speed: 0.1978597640991211 sec 
 Size: 51M 
 Precision: 16 
Algo: zstd 
 Bitround: 10.0  
 Timestep chunk: 4 
Effort: 8 
 Y chunk: 1044 X chunk: 1392 

Speed: 0.23715853691101074 sec 
 Size: 2.5G 
 Precision: 8 
Algo: zlib 
 Bitround: 0.0  
 Timestep chunk: 12 
Effort: 8 
 Y chunk: 1044 X chunk: 1392 

Speed: 0.24122095108032227 sec 
 Size: 2.0G 
 Precision: 8 
Algo: bz2 
 Bitround: 0.0  
 Timestep chunk: 12 
Effort: 8 
 Y chunk: 1044 X chunk: 1392 

Speed: 0.24934720993041992 sec 
 Size: 51M 
 Precision: 16 
Algo: zfp 
 Bitround: 7.0  
 Timestep chunk: 1 
Effort: 8 
 Y chunk: 1044 X chunk: 1392 

Speed: 0.2642090320587158 sec 
 Size: 51M 
 Precision: 16 
Algo: zfp 
 Bitround: 10.0  
 Timestep chunk: 1 
Effort: 8 
 Y chunk: 1044 X chunk: 1392 

Speed: 0.2651398181915283 sec 
 Size: 51M 
 Precision: 16 
Algo: jpeg-xl 
 Bitround: 7.0  
 Timestep chunk: 1 
Effort: 8 
 Y chunk: 1044 X chunk: 1392 
jacobbieker commented 1 year ago

Based off of this, I would probably recommend that we go with either ZFS or ZSTD for the compression algorithm. ZSTD is what we are using for the NWP data, and doesn't need another install like ZFP, and is not as particular about the 4x4x4 blocks as ZFP is. I would go with bitrounding to 10 bits, using fp16, and timestep chunks of 4 (~20 minutes each), with these larger spatial chunks. Smaller chunking in x and y was also tested, but these larger chunks allow for greater compression and apparently easier to load.

peterdudfield commented 1 year ago

thanks for doing all this, really useful to have it written down. Is it right we should go for the first one or the third one? First one is the fastest, but the third is the smallest? (Amazing difference in size btw). Im not quite sure on the trade-offs of ~15% slower to load, but reduce size by 98%.

jacobbieker commented 1 year ago

I think the 3rd one because its so much smaller, it'll just have to transfer so much more data over a network with the larger one.

jacobbieker commented 1 year ago

For the 8 years of satellite data, this corresponds to about 1.9TB for the first one, and 40GB for the third one

jacobbieker commented 1 year ago

So actually not that big for the larger one, might try just making both? 40GB is incredibly small, and 2TB is pretty small too, relatively

jacobbieker commented 1 year ago

Downside for the first one as well is that its int8, so theoretically is losing more information than the other ones

peterdudfield commented 1 year ago

40GB sounds ideal - good for sharing too / running local stuff

jacobbieker commented 1 year ago

Yeah, I'll run some more tests and do some plotting of it, to make sure its not just all rounded down to 0s or something, but then start on that

jacobbieker commented 1 year ago

Hmmm, I just remade the best running ones according to this, and they are much larger, 12GB instead of 51MB. I'm thinking this is because to speed up testing a bit, for most of these, I created one 3 day Zarr with JPEG-XL, then created these other options based off that Zarr. Which means it was compressed once with JPEG-XL, then compressed again with the bitrounding, etc. But still need to look into it some more.

jacobbieker commented 1 year ago

The fastest ones still all have the same spatial chunking, and either 4 or 12 timesteps, so just trying it again. But still on track to make the int8 version, which is the fastest anyway, and whose size makes more sense, as thats been more constant since the beginning.

jacobbieker commented 1 year ago

Ah, it seems like the very small ones are nearly all NaNs, which explains how small they get, I'm assuming just from the multiple lossy compressions. But the int8 one should be still fine at least, just no 40GB file

jacobbieker commented 1 year ago

Also now trying Blosc2, in this little wrapper:

from numcodecs.registry import register_codec
from numcodecs.abc import Codec
from numcodecs.compat import ensure_contiguous_ndarray
import blosc2

class Blosc2(Codec):
    """Codec providing compression using the Blosc meta-compressor.
    Parameters
    ----------
    cname : string, optional
        A string naming one of the compression algorithms available within blosc, e.g.,
        'zstd', 'blosclz', 'lz4', 'lz4hc', 'zlib' or 'snappy'.
    clevel : integer, optional
        An integer between 0 and 9 specifying the compression level.
    See Also
    --------
    numcodecs.zstd.Zstd, numcodecs.lz4.LZ4
    """

    codec_id = 'blosc2'
    max_buffer_size = 2**31 - 1

    def __init__(self, cname='blosc2', clevel=5):
        self.cname = cname
        if cname == "zstd":
            self._codec = blosc2.Codec.ZSTD
        elif cname == "blosc2":
            self._codec = blosc2.Codec.BLOSCLZ
        self.clevel = clevel

    def encode(self, buf):
        buf = ensure_contiguous_ndarray(buf, self.max_buffer_size)
        return blosc2.compress(buf, codec=self._codec, clevel=self.clevel)

    def decode(self, buf, out=None):
        buf = ensure_contiguous_ndarray(buf, self.max_buffer_size)
        return blosc2.decompress(buf, out)

    def __repr__(self):
        r = '%s(cname=%r, clevel=%r)' % \
            (type(self).__name__,
             self.cname,
             self.clevel,)
        return r

register_codec(Blosc2)
jacobbieker commented 1 year ago

With Blosc2, the best option of the two up there is Zstd It gives, for 1 day of data, 3.9GB file with FP16, which gives around 10TB for HRV data for the last 8 years. If saved as int8, then its 2TBish for the last 8 years of HRV.

These are some of the best and so far fastest loading ones. They are all still larger than the JPEG-XL version, but are easier to use and quicker to decode and encode.

jacobbieker commented 1 year ago

I think the way forward for this is to create 3 versions of each of HRV and non HRV for now. One would be the int8 version that is smaller and faster to load, one that is JPEG-XL for keeping as much of the original data as possible and easier to share size-wise, and the FP16 one for greater precision, while still being faster to load than JPEG-XL, although much larger. Sound good @peterdudfield @devsjc? The FP16 one might use up a lot of storage_c, but yeah.

The ones other than the JPEG-XL one would chunk in 12 timesteps at a time in time, and in the 1392x1044 spatial chunks, nonHRV would be also chunked all 11 channels together. This should cut down a lot on the number of requests our models need to read from disk.

jacobbieker commented 1 year ago

Although for some reason, the compression seems to do worse if chunking all 11 channels together, instead of separately.

peterdudfield commented 1 year ago

I think the way forward for this is to create 3 versions of each of HRV and non HRV for now. One would be the int8 version that is smaller and faster to load, one that is JPEG-XL for keeping as much of the original data as possible and easier to share size-wise, and the FP16 one for greater precision, while still being faster to load than JPEG-XL, although much larger. Sound good @peterdudfield @devsjc? The FP16 one might use up a lot of storage_c, but yeah.

The ones other than the JPEG-XL one would chunk in 12 timesteps at a time in time, and in the 1392x1044 spatial chunks, nonHRV would be also chunked all 11 channels together. This should cut down a lot on the number of requests our models need to read from disk.

Could you give the size and load times like you did above, for these different methods.

peterdudfield commented 1 year ago

Although for some reason, the compression seems to do worse if chunking all 11 channels together, instead of separately.

how does this affect size and loading speed?

jacobbieker commented 1 year ago

Although for some reason, the compression seems to do worse if chunking all 11 channels together, instead of separately.

how does this affect size and loading speed?

For size, with the blosc2 zstd fp32, its 19.2GB for the 11 channels chunked, and 13.9GB for individual channels. 10.5GB vs 7.5GB after bitrounding down to 13 bits, and and 4.2GB vs 2.8GB for uint8, so between 30 and 50% larger. As for loading speed, I'll add it in with the other ones in a bit, going to do stuff.

jacobbieker commented 1 year ago

Here is some speed outputs, some of the decoding: 8 means uint8, 16 is fp16, 32 is fp32, t is number of timesteps per chunk, c is number of channels, round is number of bits rounded if > 0:

Speed: 0.07168388366699219 sec 
 Size: 4.0G 
 Name: /home/jacob/Development/Satip/scripts/test_zarrs/hrv_uint16_blosc2_zstd_xbitinfo_t12_round0 

Speed: 0.07281303405761719 sec 
 Size: 813M 
 Name: /home/jacob/Development/Satip/scripts/test_zarrs/hrv_8_zstd_xbitinfo_t4_c1 

Speed: 0.07943344116210938 sec 
 Size: 3.9G 
 Name: /home/jacob/Development/Satip/scripts/test_zarrs/hrv_fp16_blosc2_zstd_xbitinfo_t12_round0 

Speed: 0.10940337181091309 sec 
 Size: 6.5G 
 Name: /home/jacob/Development/Satip/scripts/test_zarrs/hrv_32_zstd_xbitinfo_round_13 

Speed: 0.11301827430725098 sec 
 Size: 6.1G 
 Name: /home/jacob/Development/Satip/scripts/test_zarrs/hrv_32_blosc2_zstd_xbitinfo_t4_round13 

Speed: 0.11818528175354004 sec 
 Size: 547M 
 Name: /home/jacob/Development/Satip/scripts/test_zarrs/hrv_jpeg_xl_xbitinfo_y 

Speed: 0.12987041473388672 sec 
 Size: 547M 
 Name: /home/jacob/Development/Satip/scripts/test_zarrs/hrv_jpeg_xl_xbitinfo_x 

Speed: 0.13512945175170898 sec 
 Size: 2.8G 
 Name: /home/jacob/Development/Satip/scripts/test_zarrs/hrv_32_zstd_xbitinfo_round_7 

Speed: 0.14370512962341309 sec 
 Size: 8.9G 
 Name: /home/jacob/Development/Satip/scripts/test_zarrs/hrv_32_blosc2_xbitinfo_t4_round13 

Speed: 0.15413403511047363 sec 
 Size: 6.1G 
 Name: /home/jacob/Development/Satip/scripts/test_zarrs/hrv_32_blosc2_zstd_xbitinfo_t12_round13 

Speed: 0.16738128662109375 sec 
 Size: 4.0G 
 Name: /home/jacob/Development/Satip/scripts/test_zarrs/hrv_uint16_blosc2_zstd_xbitinfo_t4_round0 

Speed: 0.18398380279541016 sec 
 Size: 354M 
 Name: /home/jacob/Development/Satip/scripts/test_zarrs/hrv_32_zfp_tol2_xbitinfo_round7 

Speed: 0.18632292747497559 sec 
 Size: 4.0G 
 Name: /home/jacob/Development/Satip/scripts/test_zarrs/hrv_fp16_blosc2_zstd_xbitinfo_t4_round0 

Speed: 0.19820499420166016 sec 
 Size: 8.9G 
 Name: /home/jacob/Development/Satip/scripts/test_zarrs/hrv_32_blosc2_xbitinfo_t12_round13 

Speed: 0.22356891632080078 sec 
 Size: 354M 
 Name: /home/jacob/Development/Satip/scripts/test_zarrs/hrv_32_zfp_tol2_xbitinfo_round11 

Speed: 0.4651339054107666 sec 
 Size: 63M 
 Name: /home/jacob/Development/Satip/scripts/test_zarrs/nonhrv_16_jpeg_xl_xbitinfo_variable_spaced_5 

Speed: 0.5068938732147217 sec 
 Size: 4.4G 
 Name: /home/jacob/Development/Satip/scripts/test_zarrs/nonhrv_8_zstd_xbitinfo_t4_c11 

Speed: 0.6310272216796875 sec 
 Size: 2.8G 
 Name: /home/jacob/Development/Satip/scripts/test_zarrs/nonhrv_8_zstd_xbitinfo_t12_c1 

Speed: 0.9104156494140625 sec 
 Size: 95M 
 Name: /home/jacob/Development/Satip/scripts/test_zarrs/nonhrv_jpeg_xl_xbitinfo_variable 

Speed: 0.9182889461517334 sec 
 Size: 364M 
 Name: /home/jacob/Development/Satip/scripts/test_zarrs/nonhrv_jpeg_xl_xbitinfo_time_spaced_5 

Speed: 1.0254340171813965 sec 
 Size: 5.5G 
 Name: /home/jacob/Development/Satip/scripts/test_zarrs/nonhrv_16_zstd_xbitinfo_round_7 

Speed: 1.143749713897705 sec 
 Size: 503M 
 Name: /home/jacob/Development/Satip/scripts/test_zarrs/nonhrv_jpeg_xl_xbitinfo_x_spaced_5 

Speed: 2.5826869010925293 sec 
 Size: 4.4G 
 Name: /home/jacob/Development/Satip/scripts/test_zarrs/nonhrv_8_zstd_xbitinfo_t12_c11 

Speed: 2.6254141330718994 sec 
 Size: 2.5G 
 Name: /home/jacob/Development/Satip/scripts/test_zarrs/hrv_32_jpeg-xl_xbitinfo_round_13 

Speed: 2.673764944076538 sec 
 Size: 2.8G 
 Name: /home/jacob/Development/Satip/scripts/test_zarrs/nonhrv_8_zstd_xbitinfo_t4_c1 

Speed: 2.770055055618286 sec 
 Size: 2.0G 
 Name: /home/jacob/Development/Satip/scripts/test_zarrs/hrv_32_jpeg-xl_xbitinfo_round_11 

Speed: 3.2815258502960205 sec 
 Size: 1.4G 
 Name: /home/jacob/Development/Satip/scripts/test_zarrs/hrv_32_jpeg-xl_xbitinfo_round7 

Speed: 3.2975013256073 sec 
 Size: 5.8G 
 Name: /home/jacob/Development/Satip/scripts/test_zarrs/nonhrv_32_zstd_xbitinfo_round_7 

Speed: 3.4220352172851562 sec 
 Size: 2.9G 
 Name: /home/jacob/Development/Satip/scripts/test_zarrs/hrv_32_jpeg-xl_xbitinfo_round11 

Speed: 3.773158073425293 sec 
 Size: 4.0G 
 Name: /home/jacob/Development/Satip/scripts/test_zarrs/hrv_32_jpeg-xl_xbitinfo_round13 

Speed: 4.655216217041016 sec 
 Size: 13G 
 Name: /home/jacob/Development/Satip/scripts/test_zarrs/nonhrv_32_zstd_xbitinfo_round_13 

Speed: 10.474880695343018 sec 
 Size: 1018M 
 Name: /home/jacob/Development/Satip/scripts/test_zarrs/hrv_16_jpeg-xl_xbitinfo_round_7 

Speed: 29.518927812576294 sec 
 Size: 5.7G 
 Name: /home/jacob/Development/Satip/scripts/test_zarrs/nonhrv_32_jpeg-xl_xbitinfo_round_13 

Speed: 30.123006105422974 sec 
 Size: 4.3G 
 Name: /home/jacob/Development/Satip/scripts/test_zarrs/nonhrv_32_jpeg-xl_xbitinfo_round_11 

Speed: 45.913793325424194 sec 
 Size: 1.9G 
 Name: /home/jacob/Development/Satip/scripts/test_zarrs/nonhrv_16_jpeg-xl_xbitinfo_round_7 

Speed: 48.57991552352905 sec 
 Size: 3.4G 
 Name: /home/jacob/Development/Satip/scripts/test_zarrs/nonhrv_16_jpeg-xl_xbitinfo_round_10 
jacobbieker commented 1 year ago

The non-HRV is also being created again on Donatello, as the lossless data is quite large, so the nonHRV values might change a bit (probably a bit larger and slower to access going by the differences between the ones created from lossess HRV and the lossy one)

peterdudfield commented 1 year ago

Im strugglering a bit with lots of numbers here. Could we get something that summarizes shows the difference in speed and size with the diffenent methods:

Something like this for me would be very useful

Compressions Size Type Speed
Current TODO TODO TODO
JPEG-XL 0.8 GB TODO TODO
BZ2 2 GB int8 TODO
jacobbieker commented 1 year ago

Yep, can do that

peterdudfield commented 1 year ago

feel free to add other rows if needed

jacobbieker commented 1 year ago

Here it is for the HRV ones, seems like I might need to try ZFP a bit more. All of them are faster than the current compression at least.

Compression Speed Size Type Bitround Timestep Chunk Channel Chunk
JPEG-XL 3.28152 1.4GB FP32 7 1 1
JPEG-XL 3.42203 2.9GB FP32 11 1 1
JPEG-XL 3.77315 4.0GB FP32 13 1 1
Blosc2 ZSTD 0.07168 4GB Uint16 None 12 1
Blosc2 ZSTD 0.07943 3.9GB FP16 None 12 1
ZSTD 0.10940 6.5GB FP32 13 12 1
Blosc2 ZSTD 0.11301 6.1GB FP32 13 4 1
ZSTD 0.13513 2.8GB FP32 7 4 1
Blosc2 0.143705 8.9GB FP32 13 4 1
Blosc2 0.154134 6.1GB FP32 13 12 1
Blosc2 ZSTD 0.167381 4.0GB Uint16 None 4 1
Blosc2 ZSTD 0.186322 4.0GB FP16 None 4 1
ZFP Tolerance=2 0.223568 354MB FP32 11 4 1
Current 4.399742 int16 None 1 1
jacobbieker commented 1 year ago

The ZFP is the most lossy, so would need to check more how much error there is, but these ones did all return non-NaN results, so actually did return data

jacobbieker commented 1 year ago

ZFP actually seems to be making everything the same value, not really sure why, so disregarding that now then. In lossless mode, ZFP creates huge (>30GB) files for this single day, so spacewise isn't really feasible Edit: I think its the tolerance, it can round things to the same value I guess?

jacobbieker commented 1 year ago

More ones for just Blosc2 ZSTD compression

Compression Speed Size Type Bitround Timestep Chunk clevel
Blosc Zstd 0.072362 825MB Uint8 None 12 5
Blosc Zstd 0.18395 13GB FP32 None 4 5
Blosc Zstd 0.273139 12GB FP32 None 4 9
Blosc Zstd 0.336419 5.9GB FP32 13 4 9
Blosc Zstd 0.398397 4.0GB FP16 None 12 5
Blosc Zstd 0.59300 827MB Uint8 None 4 5
Blosc Zstd 0.81063 4.1GB Uint16 None 12 9
Blosc Zstd 1.2086 6.2GB FP32 13 4 5
Blosc Zstd 1.240698 6.2GB FP32 13 12 5
Blosc Zstd 1.69135 4.1GB Uint16 None 4 9
Blosc Zstd 1.86772 12GB FP32 None 12 9
Blosc Zstd 1.970765 4.0GB FP16 None 4 5
Blosc Zstd 2.0007 5.9GB FP32 13 12 9
Blosc Zstd 2.53900 13GB FP32 None 12 5
jacobbieker commented 1 year ago

Another set of outputs from Blosc2 ZSTD testing, this time removing any results where all the values are the same (indicating that the compression is compressing too much and making the data useless) or any NaNs are in the data (as the data loaded shouldn't have any NaNs)

Compression Speed Size Type Bitround Timestep Chunk clevel
Blosc Zstd 0.079500 4.0GB FP16 None 12 5
Blosc Zstd 0.087661 3.9GB Uint16 None 12 9
Blosc Zstd 0.091952 3.7GB FP16 None 12 9
Blosc Zstd 0.13554 5.0GB FP32 11 4 5
Blosc Zstd 0.136107 4.8GB FP32 11 4 8
Blosc Zstd 0.145195 13GB FP32 None 4 5
Blosc Zstd 0.146209 12GB FP32 None 4 9
Blosc Zstd 0.160394 4.8GB FP32 11 12 8
Blosc Zstd 0.167418 4.5GB FP32 11 12 9
Blosc Zstd 0.16927 4.0GB FP16 None 4 5
Blosc Zstd 0.18170 6.2GB FP32 13 12 5
Blosc Zstd 0.20448 12GB FP32 None 12 9
Blosc Zstd 0.20723 5.0GB FP32 11 12 5
Blosc Zstd 0.21482 3.7GB FP16 None 4 9
jacobbieker commented 1 year ago

I think the best option from this is the FP16, 12 timestep chunk compression. clevel=5 would result in a 10.5TB HRV 8-year dataset, while clevel=9 would give a 9.75TB dataset. clevel=5 is a lot faster to write and roughly 14% faster to read, so the small tradeoff in size is probably worth it. The space difference might be more pronounced in the non-HRV data though, so do need to check that when can..

jacobbieker commented 1 year ago

The HRV and non-HRV Zarrs are being created now, using Donatello to create them on Leonardo. They are being built using this script: https://github.com/openclimatefix/Satip/blob/main/scripts/read_and_combine_satellite.py saving them out as FP16, clevel 5, 12 timesteps per chunk. Each chunk is between 70-140MB on disk for the non-HRV. Each Zarr will cover a single year of data.

Because there are missing timesteps, the most likely next step after creating these ones is to download and append the missing timesteps to the zarrs. This might mean, though, that the 12 timestep chunks contain timesteps that are not near each other, and so might require one more creation pass where, after the missing data is added and the timesteps sorted in order, a new zarr is created.