strange bug that appears for our mock catalogs when upgraded to corrfunc v2

BenWibking commented 6 years ago

When I run it on my mocks, the code fails and says to report a bug:

mwe_corrfunc_screenshot

A minimal working example is contained in this tarball: http://www.astronomy.ohio-state.edu/~wibking.1/corrfunc/mwe.tar

The conditions needed to reproduce it are: compute xi(s) from any mock produced by my HOD code with Corrfunc v2.

BenWibking commented 6 years ago

@ANSalcedo

lgarrison commented 6 years ago

At first glance, the input does appear to all be in the expected range, so this does look like a Corrfunc bug of some kind. I'll try to look at this some more in the next few days.

manodeep commented 6 years ago

This is weird. Here are the flags associated with x/y/z

    C_CONTIGUOUS : True
    F_CONTIGUOUS : True
    OWNDATA : False
    WRITEABLE : True
    ALIGNED : True
    UPDATEIFCOPY : False

If the data are C_CONTIGUOUS, then any issue out of different strides should not occur. And I can see that the arrays values are all in [0.0, 720.0).

I can solve the issue by round-tripping

    x = np.array(x.tolist(), dtype=np.float32)

The only difference I see with the initial x.flags and this new array->list->array x.flags is the OWNDATA flag. However, neither x = np.ascontiguousarray(x) or by x = np.require(x, requirements=['C', 'O', 'W']) work.

I do not have a detailed enough understanding of numpy :(

manodeep commented 6 years ago

(Also, the requirements I specified spelled 'COW')

BenWibking commented 6 years ago

This is very weird. C_CONTIGUOUS should be all that is needed to treat the data like a C array. It's possible the flag is set incorrectly. Should be testable by iterating through x[i] via cython and passing x[0] as a pointer to the cython function. This has to be some kind of bug in h5py or (less likely) numpy...? I don't see how else x=np.array(x.tolist()) would fix the problem.

On 10/31/17 1:41 AM, Manodeep Sinha wrote:

This is weird. Here are the flags associated with |x/y/z|
C_CONTIGUOUS : True
F_CONTIGUOUS : True
OWNDATA : False
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
If the data are |C_CONTIGUOUS|, then any issue out of different |strides| should not occur. And I can see that the arrays values are all in |[0.0, 720.0)|.

I can solve the issue by round-tripping

x = np.array(x.tolist(), dtype=np.float32)

The only difference I see with the initial |x.flags| and this new array->list->array |x.flags| is the OWNDATA flag. However, neither |x = np.ascontiguousarray(x)| or by |x = np.require(x, requirements=['C', 'O', 'W'])| work.

I do not have a detailed enough understanding of |numpy| :(

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/manodeep/Corrfunc/issues/140#issuecomment-340666015, or mute the thread https://github.com/notifications/unsubscribe-auth/ABswn_I0S1hIrQTT0njFzEJGxUfC945Cks5sxrMYgaJpZM4QL-RQ.

lgarrison commented 6 years ago

Think I figured it out. The position arrays are big-endian:

>>> x.dtype
dtype('>f4')

but Corrfunc is probably interpreting them as native-endian, which is little-endian in most cases. Converting the arrays to little-endian with x.byteswap() does the trick, but the correct solution is probably to teach Corrfunc to detect the endianness. This is easy to do in the Python wrappers, which is probably sufficient, since the only way we can ever detect endianness is if the arrays come from Python anyway (i.e. the C interfaces know nothing about endianness).

HDF5 is supposed to take care of some of these endianness issues, but h5py may simply be preserving the endianness of Numpy array inputs. In that case, another quick fix is to make sure the arrays are little-endian before they go into the HDF5 files.

BenWibking commented 6 years ago

Wow, that's amazing. I have no idea how I managed to create big-endian arrays! Thanks, Lehman. This is very helpful.

On 10/31/17 12:44 PM, Lehman Garrison wrote:

Think I figured it out. The position arrays are big-endian:

|>>> x.dtype() dtype('>f4') |

but Corrfunc is probably interpreting them as native-endian, which is little-endian in most cases. Converting the arrays to little-endian with |x.byteswap()| does the trick, but the correct solution is probably to teach Corrfunc to detect the endianness. This is easy to do in the Python wrappers, which is probably sufficient, since the only way we can ever detect endianness is if the arrays come from Python anyway (i.e. the C interfaces know nothing about endianness).

HDF5 is supposed to take care of some of these endianness issues, but h5py may simply be preserving the endianness of Numpy array inputs. In that case, another quick fix is to make sure the arrays are little-endian before they go into the HDF5 files.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/manodeep/Corrfunc/issues/140#issuecomment-340825002, or mute the thread https://github.com/notifications/unsubscribe-auth/ABswn2A7bhaFYmNkQ6Sql3Z8BbLy01ruks5sx05rgaJpZM4QL-RQ.

manodeep commented 6 years ago

It's not that Corrfunc is expecting little-endian, Corrfunc is expecting everything to be in machine-endian. And the array from the hdf5 file is big-endian while the computer itself is little-endian. If the computer were big-endian, then there would not be any problems.

The fix would be to ensure (in the python wrappers) that all arrays in machine-endian order.

manodeep commented 6 years ago

(The biggest reason I don't worry about this particular endian-ness mis-match bug is that Corrfunc will crash. Fixing the endian-ness handling is part of #101 )

@BenWibking Do you mind if we close this particular issue?

BenWibking commented 6 years ago

Fixing it and saving future graduate students from tearing their hair out should take just 10 additional lines of code in the python wrapper:

def convert_to_native_endian(array): import sys system_is_little_endian = (sys.byteorder == 'little') array_is_little_endian = (array.dtype.byteorder == '<') is_native_endian = (system_is_little_endian and array_is_little_endian) or (not system_is_little_endian and not array_is_little_endian) or (array.dtype.byteorder == '=') if not is_native_endian: return array.byteswap().newbyteorder() else: return array

x,y,z,weights = [convert_to_native_endian(arr) for arr in [x,y,z,weights]]

On 10/31/17 4:37 PM, Manodeep Sinha wrote:

(The biggest reason I don't worry about this particular endian-ness mis-match bug is that Corrfunc will crash. Fixing the endian-ness handling is part of #101 https://github.com/manodeep/Corrfunc/issues/101 )

@BenWibking https://github.com/benwibking Do you mind if we close this particular issue?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/manodeep/Corrfunc/issues/140#issuecomment-340899043, or mute the thread https://github.com/notifications/unsubscribe-auth/ABswn2fMJIlyRCN4O3J7IcYuFkRFis03ks5sx4T4gaJpZM4QL-RQ.

manodeep commented 6 years ago

@BenWibking You know what I am going to say right? ....PR... :)

I think you can make the if condition slightly concise:


def convert_to_native_endian(array):
    '''
    Returns the supplied array in native endian byte-order

    Parameters
    ------------

    array: A numpy array

    Returns
    --------

    array: A numpy array in native-endian byte-order

    Example
    ----------

    >>> import numpy as np
    >>> create a big-endian array
    >>> and then covert to little-endian
    >>> and make sure that the output dtype is little-endian
    >>> This serves as a doctest
    >>> create a native-endian array
    >>> run through function
    >>> ensure output is the same
    '''

    import sys
    system_is_little_endian = (sys.byteorder == 'little')   
    array_is_little_endian = (array.dtype.byteorder == '<')
    if (array_is_little_endian != system_is_little_endian) or (array.dtype.byteorder == '='):
        return array.byteswap().newbyteorder()
    else:
        return array

The doctests are left for you :)

manodeep commented 6 years ago

Plus, some tests to make sure that the array is indeed a numpy array or returning back None if the input is None

lgarrison commented 6 years ago

I can work on this.

manodeep / Corrfunc

strange bug that appears for our mock catalogs when upgraded to corrfunc v2 #140