manodeep / Corrfunc

⚡️⚡️⚡️Blazing fast correlation functions on the CPU.
https://corrfunc.readthedocs.io
MIT License
163 stars 50 forks source link

Bug when provided arrays have non native endiness #190

Closed Christopher-Bradshaw closed 4 years ago

Christopher-Bradshaw commented 4 years ago

General information

Issue description

I think that the changes around https://github.com/manodeep/Corrfunc/commit/91e40fcd0770babc6fb1687065d5fe4fcb35126a#diff-9cae30395e1cbc9ddac829068742ee1aR294 introduced a bug when the provided arrays are not of native endiness. I'm not an expert, but it looks like modifications to locals() may not be noticed by the interpreter (see https://docs.python.org/3.7/library/functions.html#locals). I've confirmed by stepping through with pdb that this is the issue.

Expected behavior

The usual blazing fast counts :)

Actual behavior

RuntimeError with the message Declination should not be more than 180

Minimal failing example

import numpy as np
import Corrfunc

# My system is LittleEndian so create BigEndian arrays
cat = np.zeros(10, dtype=[("ra", ">f4"), ("dec", ">f4")])
cat["ra"] = np.random.random(10)
cat["dec"] = np.random.random(10)

res = Corrfunc.mocks.DDtheta_mocks(
    autocorr=True, nthreads=1, binfile=[0, 0.2], RA1=cat["ra"], DEC1=cat["dec"]
)

I'm happy to submit a PR to fix this, should just be a couple line change at the location I linked to.

manodeep commented 4 years ago

@Christopher-Bradshaw Thank you for reporting this issue! A PR with a fix will be very much appreciated. Looks like ll the other pair-counters also suffer from the same issue. If possible, will you please correct the other instances of _locals?

Christopher-Bradshaw commented 4 years ago

Sure think - I'll send in a PR in the next day or so :+1:

manodeep commented 4 years ago

Thanks!

manodeep commented 4 years ago

@Christopher-Bradshaw Will you confirm that your PR in #191 fixes this bug? If so, I will close this issue.

Thanks again!

Christopher-Bradshaw commented 4 years ago

That change works for me, though I have only tested DDtheta_mocks.

Thanks for your help!

manodeep commented 4 years ago

Stands to reason that if DDtheta_mocks is fixed, then all others are too.

Thanks again!

lgarrison commented 4 years ago

@Christopher-Bradshaw, thanks for catching this! This was my bug. I probably tested the locals() usage (which I agree is hacky) inside a Jupyter notebook, where it seems to work. But I agree it doesn't work inside a function.

As you saw in the diff where we introduced the bug, we did use to write out all the arrays twice, e.g.

X1, Y1, Z1, weights1, X2, Y2, Z2, weights2 = [convert_to_native_endian(arr, warn=True) for arr in [X1, Y1, Z1, weights1, X2, Y2, Z2, weights2]]

but this felt verbose and fragile, doubly so since some version of it is repeated in every kernel. Hence the attempt to change to locals().

I don't think there's a good way around this right now. But I also note that our kernels are collecting a lot of arguments (24 for DD). If we did move to a model where we captured arguments via DD(*args, **kwargs), we could easily process variables by name. I think the main trade-off is readability of the code.

Christopher-Bradshaw commented 4 years ago

Hey @lgarrison no worries. And I definitely understand why you tried this. As you say, neither method looks particularly good, and the one I reverted to definitely has the chance to have things get out of order.

Interesting that there is different behaviour between jupyter/intreactive sessions and the normal interpreter.