xarray-contrib / xarray-simlab

Xarray extension and framework for computer model simulations
http://xarray-simlab.readthedocs.io
BSD 3-Clause "New" or "Revised" License
73 stars 9 forks source link

0 values set to nan in output #172

Closed feefladder closed 3 years ago

feefladder commented 3 years ago

Related to #170 : when accessing the step at runtime, the first value is nan. example:

@xs.process
class Foo:
    a = xs.variable(intent="out")

    @xs.runtime(args="step")
    def run_step(self, n):
        print(n)
        self.a = n

model = xs.Model({"foo": Foo})
ds_in = xs.create_setup(
    model=model,
    clocks={"clock": range(5)},
    output_vars={"foo__a": "clock"},
)
print(ds_in.xsimlab.run(model=model).foo__a.data)

prints:

0
1
2
3
[nan 1. 2. 3.]

similar to when accessing main_clock_values:

@xs.process
class Foo:
    a = xs.variable(intent="out", dims=xs.MAIN_CLOCK)

    @xs.runtime(args="main_clock_values")
    def initialize(self, clock):
        self.a = clock

returns [nan 1. 2. 3. 4.] even though internally the correct values are used

feefladder commented 3 years ago

possibly this is caused by the zarr dataset being created only in the second iteration (so when initialize has run and the first run_step etc.)

feefladder commented 3 years ago

actually, 0 values are set to nan:

@xs.process
class Foo:
    a = xs.variable(intent="out")

    def run_step(self):
        self.a = 0

returns nan

jvail commented 3 years ago

I think this is related to https://xarray-simlab.readthedocs.io/en/latest/whats_new.html#breaking-changes

When I switched to version 0.5 I was puzzled by the new behavior. Currently I am always passing decoding: { mask_and_scale: False } into the run function.

feefladder commented 3 years ago

Wow, that cost me a lot of headaches :') thanks a lot! But still... It is quite counter-intuitive behaviour IMHO. Now it is clear why it did work when setting it to 0.

feefladder commented 3 years ago

ah yes, as it seems, it is also in the docs. It seems to me that the values were arbitrarily chosen @benbovy ? However, 0 is a very likely value to be in the (output) data, even for basic users, that are then confronted with something that is under advanced encoding options.

wouldn't it make more sense to set the default fill value to something that is unlikely to be in the actual output?
like adding

    elif dtype.kind == "i":
        return np.iinfo(dtype).min

here?

benbovy commented 3 years ago

Yep sorry for this change in version 0.5. I actually was hesitating about this change and I'm still not very happy with it. One related issue is that if we set mask_and_scale=False when opening the output zarr store as a xarray Dataset, then we have annoying attributes that can't be serialized when saving the dataset to a netcdf file.

Those issues should be fixed upstream IMO in zarr (where 0 is the default filling value, I'm not sure exactly why) and xarray.

wouldn't it make more sense to set the default fill value to something that is unlikely to be in the actual output?

Yes why not.

feefladder commented 3 years ago

I made a PR #173 about it, the current settings: