Open ayoubft opened 1 year ago
Thanks @ayoubft! Could you provide a minimal code snippet and a link to the dataset you are using? This would be of great help. Thanks.
The error originates from here
mutual_info = (p * np.ma.log(p / (pr * ps))).sum(axis=(-1, -2)) / np.log(base)
which confuses me because while a,b
should scale with the size of the data (they should be the 1:
, :-1
non-allocating views on the actual data array), counts, p, pr, ps
shouldn't!!! They should be of size nbits x 4 (for every bit position a 2x2 joint probability matrix) @ayoubft maybe you could check the size of these arrays? @observingClouds could you clarify whether there's a lazy evaluation triggered in this line?
Maybe related
this seems to have an outer loop over the number of bits then an inner loop over all elements in the data. Which means that I suspect (a >> s).astype("u1")
to be allocating an entire copy of the array! In BitInformation.jl I do this therefore the other way around: Loop over ever element pair in the data and then inner loop over the bits. This is non-allocating.
This issue sounds weirdly familiar, and indeed we discussed this already at the beginning of this year: https://github.com/observingClouds/xbitinfo/pull/156#issuecomment-1424618296
For the dataset, I am using this one, (but I will need to check if it can be shared):
The code snippet is the following:
path_to_data = 'data/netcdf/ecmwf_hs3g_20181101_msl.nc'
info_per_bit = xb.get_bitinformation(ds, dim="latitude", implementation="python")
And it raises the error above.
Thanks @ayoubft! No worries with regards to sharing the dataset. I'll find one myself.
Working with high resolution dataset :
Dimensions: longitude: 24000; latitude: 12000; time: 1
.When I try to
get_bitinformation
using the python implementation it raises this error:MemoryError: Unable to allocate 8.58 GiB for an array with shape (287976000, 8, 4) and data type bool
PS: When reverting to the julia implementation it works without this error.
Full output