openclimatefix / ocf_datapipes

OCF's DataPipe based dataloader for training and inference
MIT License
13 stars 11 forks source link

check for zeros in loader #335

Open peterdudfield opened 1 week ago

peterdudfield commented 1 week ago

Detailed Description

Check for zeros when loading the data. This should only be used when running inference

Context

Possible Implementation

glitch401 commented 5 days ago

I'll take this up as well, I think it's correlated to https://github.com/openclimatefix/ocf_datapipes/issues/337

glitch401 commented 5 days ago

Apologies if this sounds trivial, but about the implementation, are we planning to log it or raise a ValueError?

peterdudfield commented 5 days ago

Raise an error please

glitch401 commented 3 days ago

I was trying to trigger positive and negative test cases, by cloning and replacing nwp data. but there seems to be something at large with the data that i'm missing to figure out.

Reference:

for i in store:
   zarr.copy(store[i], store_clone, name=i)
store_clone['UKV'][0, 0, 0]=0  

can you help me point to the right direction, please?

peterdudfield commented 3 days ago

That error can sometimes happen if the file you load isnt a zarr. Can you check it manually that is had the .zmetadata in tests/data/nwp_data/test_with_zeros.zarr

glitch401 commented 2 days ago

Yep, that worked :D Apparently the way I was coloning the zarr, didnt carry over the metadata at .zmetadata

Also, in reference to #337, can you help me understand the physical limits for nwp.variable UKV?

glitch401 commented 2 days ago

to solve the above, I've manually copied .zmetadata and .zarray which works as i'm manually testing the test functions at test\load\nwp\test_load_nwp.py:

def test_check_for_zeros():
    #positive test case
    nwp_datapipe = OpenNWP(
        zarr_path="tests/data/nwp_data/test_with_zeros_n_limits.zarrr",
        check_for_zeros=True)
    with pytest.raises(ValueError): # checks for Error raised if NWP DataArray contains zeros
        metadata = next(iter(nwp_datapipe))

    #negative test case
    nwp_datapipe = OpenNWP(
        zarr_path="tests/data/nwp_data/test.zarr",
        check_for_zeros=True)
    metadata = next(iter(nwp_datapipe))
    assert metadata is not None

there is something odd happening as I'm trying to run pytest on ocf_datapipeline. the files get automatically deleted, which ends up throwing the same error :/

KeyError: ".zmetadata\nThis exception is thrown by __iter__ of OpenNWPIterDataPipe(check_for_zeros=False, check_physical_limits=False, zarr_path='tests/data/nwp_data/test_with_zeros.zarr')"
peterdudfield commented 2 days ago

Which files get delete?

Also do you mean zarr_path="tests/data/nwp_data/test_with_zeros_n_limits.zarr" instead of zarr_path="tests/data/nwp_data/test_with_zeros_n_limits.zarrr"?

glitch401 commented 1 day ago

tests/data/nwp_data/test_with_zeros_n_limits.zarr/.zmetadata gets deleted as I'm running pytest

oh yes, my bad, it was a silly typo

peterdudfield commented 1 day ago

did the typo fix it? Or is it still getting deleted?

peterdudfield commented 1 day ago

did the typo fix it? Or is it still getting deleted?

Can you work out where the file gets deleted?

glitch401 commented 1 day ago

The typo dosent exist in the real code. I'm trying to figure the same. There seeem to be some correlation with pytest

peterdudfield commented 1 day ago

How odd, can you push code to a draft PR and I might be able to have a look at it?

glitch401 commented 1 day ago

sure thing I'll do that

glitch401 commented 1 day ago

If it helps, I've used this code to produce test_with_zeros_n_limits.zarr:

import zarr
import shutil
import numpy as np

original_store_path = 'tests/data/nwp_data/test.zarr'
original_store = zarr.open(original_store_path, mode='r')

new_store_path = 'tests/data/nwp_data/test_with_zeros_n_limits.zarr'
shutil.rmtree(new_store_path, ignore_errors=True)
with zarr.open(new_store_path, mode='w') as new_store:
    for item in original_store:
        zarr.copy(original_store[item], new_store, name=item)

    new_store['UKV'][0, 0, 0,0]=0
    new_store['UKV'][0, 0, 0,1]=np.random.uniform(240, 350, size=(548,))
peterdudfield commented 1 day ago

This script doesn't get run with pytest does it?

It should just be run once yea?

glitch401 commented 1 day ago

The script is supposed to run once, to generate the data. Do you think we'd be better off if it's run as part of test_load_nwp.py?

peterdudfield commented 1 day ago

Perhaps its worth a try. And that would mean less test data is committed to the repo

glitch401 commented 23 hours ago

alrighty, I'll make the necessary changes to the draft PR