Closed bennahugo closed 1 year ago
refer to discussion https://github.com/ratt-ru/xova/pull/28
I am not 100% sure that this is a bug in dask-ms - it may just be that a previously silent, non-fatal error is now noisy. I suspect that ROWID
is no longer consistent with the averaged data.
I will try reproduce and let you know if it is something more malign.
the xova PR is now passing tests on fake and real data again. You can go from that branch directly to reproduce. Thanks @JSKenyon. It is not critically urgent we find a fix for this. The keywords exposure takes higher priority
This is relatively simple, and it should probably be fixed in dask-ms
. The cause of the problem is the fact that np.nan == np.nan
returns False
i.e. the comparison fails in the case of unknown chunk sizes.
Great thanks for finding it, you're much more familiar with the dask codebase than I am. We can probably punch a release and make xova do != 0.2.12 in the dependencies?
Hmm, I cannot remember everything that has gone in since the last release. @sjperkins may have a better idea. This bug is actually a little fiddlier than I thought. Turns out that the nan
on the dask.array
is not actually a np.nan
, it is a float('nan')
. Just need to figure out how resilient the logic is when comparing these things.
lol I love how each library defines its own types instead of just relying on well defined defacto standard types in cpython....
Am I missing something here?
n [1]: float('nan')
Out[1]: nan
In [2]: type(float('nan'))
Out[2]: float
In [3]: import numpy as np
In [4]: np.nan
Out[4]: nan
In [5]: type(np.nan)
Out[5]: float
In [6]: float('nan') == np.nan
Out[6]: False
In [9]: from ctypes import *
In [12]: bin(cast(pointer(c_float(float('nan'))), POINTER(c_int64)).contents.value)
Out[12]: '0b1111111110000000000000000000000'
In [13]: bin(cast(pointer(c_float(np.nan)), POINTER(c_int64)).contents.value)
Out[13]: '0b1111111110000000000000000000000'
In [23]: bin(cast(pointer(c_float(np.nan)), POINTER(c_int64)).contents.value == cast(pointer(c_float(float('nan'))), POINTER(c_int64)).contents.value)
Out[23]: '0b1'
the two are bitwise equal IEEE754 representations?!
...
result. QNaN is represented by 0 or 1 as the sign bit, all 1s as
exponent, and a 0 as the left-most bit of the significand and at least
one 1 in the rest of the significand. SNaN is represented by 0 or
1 as the sign bit, all 1s as exponent, and a 1 as the left-most bit of
the significand and any string of bits for the remaining 22 bits. We
give below the representations of QNaN and SNaN.
Wait I'm being silly. nan cannot accept comparisons. You need something that catches the nan signal
In [24]: np.nan == np.nan
Out[24]: False
In [25]: np.isnan(np.nan)
Out[25]: True
In [26]: np.isnan(float('nan'))
Out[26]: True
Yeah, math.isnan
works too.
Thanks for the investigation and report. Have a fix incoming in https://github.com/ratt-ru/dask-ms/pull/256, but the CI is a bit broken at the moment.
Closed by #255
Description
Upstream breakage caught by xova test cases
What I Did
Just force upgrade xova branch prepare-0.1.2 to use the latest dask-ms 0.2.12 and run the test suite