numpy / numpy

The fundamental package for scientific computing with Python.
https://numpy.org
Other
27.21k stars 9.67k forks source link

BUG: Spurious `TypeError` exception upon calling `asarray` on empty array buffer #26366

Open inducer opened 2 months ago

inducer commented 2 months ago

Describe the issue:

I create an object exhibiting the array interface for a size-zero array with a NULL pointer as storage. Numpy mumbles something seemingly spurious about float when trying to call asarray on that object.

Reproduce the code example:

import numpy as np

class EmptyArray:
    def __init__(self):
        self.__array_interface__ = {'version': 3, 'shape': (0,), 'strides': (8,), 'typestr': '<f8', 'data': (0, False)} 

empt = EmptyArray()
np.asarray(empt)

Error message:

Traceback (most recent call last):
  File "/home/andreas/tmp/numpy-empty-ary-interface.py", line 8, in <module>
    np.asarray(empt)
TypeError: float() argument must be a string or a real number, not 'EmptyArray'

Python and NumPy Versions:

1.26.4 3.12.3 (main, Apr 10 2024, 05:33:47) [GCC 13.2.0]

Runtime Environment:

[{'numpy_version': '1.26.4',
  'python': '3.12.3 (main, Apr 10 2024, 05:33:47) [GCC 13.2.0]',
  'uname': uname_result(system='Linux', node='arc', release='6.7.9-amd64', version='#1 SMP PREEMPT_DYNAMIC Debian 6.7.9-2 (2024-03-13)', machine='x86_64')},
 {'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
                      'found': ['SSSE3',
                                'SSE41',
                                'POPCNT',
                                'SSE42',
                                'AVX',
                                'F16C',
                                'FMA3',
                                'AVX2'],
                      'not_found': ['AVX512F',
                                    'AVX512CD',
                                    'AVX512_KNL',
                                    'AVX512_KNM',
                                    'AVX512_SKX',
                                    'AVX512_CLX',
                                    'AVX512_CNL',
                                    'AVX512_ICL']}},
 {'architecture': 'Prescott',
  'filepath': '/home/andreas/src/env-3.12/lib/python3.12/site-packages/numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so',
  'internal_api': 'openblas',
  'num_threads': 12,
  'prefix': 'libopenblas',
  'threading_layer': 'pthreads',
  'user_api': 'blas',
  'version': '0.3.23.dev'}]

Context for the issue:

I maintain PyOpenCL. The issue arose in the handling of empty shared-virtual-memory ("SVM") allocations, which PyOpenCL needs to support. PyOpenCL tightly integrates with numpy, and particularly SVM allocations (i.e. memory that is accessible both from the host and the compute device/GPU) are exposed as numpy arrays. Numpy seems to support empty arrays just fine, just creating them through the array interface appears to lead to this, seemingly, spurious error.

ngoldbaum commented 2 months ago

To help understand why this is happening, ultimately the error is being raised by this C API call:

https://github.com/numpy/numpy/blob/b2960879bbf90af0cc3e533d2721d3a30470b056/numpy/_core/src/multiarray/arraytypes.c.src#L96

And the error has this accompanying C traceback (starting from the asarray call and eliding CPython frames):

  * frame #0: 0x0000000100a4b728 libpython3.13t.dylib`PyFloat_FromString(v=0x00000528564d4210) at floatobject.c:237:13 [opt]
    frame #1: 0x0000000100fd4294 _multiarray_umath.cpython-313t-darwin.so`MyPyFloat_AsDouble(obj=<unavailable>) at arraytypes.c.src:96:11 [opt]
    frame #2: 0x0000000100fd4144 _multiarray_umath.cpython-313t-darwin.so`DOUBLE_setitem(op=0x00000528564d4210, ov=0x00006000012d8950, vap=0x0000052857247b60) at arraytypes.c.src:396:28 [opt]
    frame #3: 0x0000000101098910 _multiarray_umath.cpython-313t-darwin.so`PyArray_FromInterface [inlined] PyArray_SETITEM(arr=0x0000052857247b60, itemptr=<unavailable>, v=0x00000528564d4210) at dtypemeta.h:284:12 [opt]
    frame #4: 0x00000001010988f4 _multiarray_umath.cpython-313t-darwin.so`PyArray_FromInterface(origin=0x00000528564d4210) at ctors.c:2393:13 [opt]
    frame #5: 0x0000000101097bb0 _multiarray_umath.cpython-313t-darwin.so`_array_from_array_like(op=0x00000528564d4210, requested_dtype=0x0000000000000000, writeable='\0', context=<unavailable>, copy=-1, was_copied_by__array__=0x000000016fdfdcf0) at ctors.c:1483:15 [opt]
    frame #6: 0x000000010107abbc _multiarray_umath.cpython-313t-darwin.so`PyArray_DiscoverDTypeAndShape_Recursive(obj=0x00000528564d4210, curr_dims=0, max_dims=64, out_descr=0x000000016fdfde00, out_shape=0x000000016fdfde08, coercion_cache_tail_ptr=0x000000016fdfdd78, fixed_DType=0x0000000000000000, flags=0x000000016fdfdd74, copy=-1) at array_coercion.c:1041:32 [opt]
    frame #7: 0x000000010107b664 _multiarray_umath.cpython-313t-darwin.so`PyArray_DiscoverDTypeAndShape(obj=0x00000528564d4210, max_dims=64, out_shape=0x000000016fdfde08, coercion_cache=0x000000016fdfddf8, fixed_DType=0x0000000000000000, requested_descr=0x0000000000000000, out_descr=0x000000016fdfde00, copy=-1, was_copied_by__array__=0x000000016fdfddf4) at array_coercion.c:1303:16 [opt]
    frame #8: 0x0000000101099210 _multiarray_umath.cpython-313t-darwin.so`PyArray_FromAny_int(op=0x00000528564d4210, in_descr=0x0000000000000000, in_DType=0x0000000000000000, min_depth=0, max_depth=0, flags=80, context=0x0000000000000000, was_scalar=0x000000016fdfe07c) at ctors.c:1586:12 [opt]
    frame #9: 0x0000000101099db0 _multiarray_umath.cpython-313t-darwin.so`PyArray_CheckFromAny_int(op=0x00000528564d4210, in_descr=0x0000000000000000, in_DType=0x0000000000000000, min_depth=0, max_depth=0, requires=80, context=0x0000000000000000) at ctors.c:1856:11 [opt]
    frame #10: 0x00000001010e5ddc _multiarray_umath.cpython-313t-darwin.so`_array_fromobject_generic(op=0x00000528564d4210, in_descr=<unavailable>, in_DType=0x0000000000000000, copy=<unavailable>, order=NPY_KEEPORDER, subok='\0', ndmin=0) at multiarraymodule.c:1661:28 [opt]
    frame #11: 0x00000001010dff64 _multiarray_umath.cpython-313t-darwin.so`array_asarray(__NPY_UNUSED_TAGGEDignored=<unavailable>, args=0x0000000100010078, len_args=1, kwnames=0x0000000000000000) at multiarraymodule.c:1773:21 [opt]

No idea if there are other issues, but maybe we just need to have a check for empty shape here:

https://github.com/numpy/numpy/blob/b2960879bbf90af0cc3e533d2721d3a30470b056/numpy/_core/src/multiarray/ctors.c#L2386-L2397

And not just assume that the array is a scalar at that point.

seberg commented 2 months ago

Just to note, I think this is a duplicat of gh-26037. There is a strong assumption NULL data means scalar and I am not even sure that makes sense, that whole path seemed very shady when I lookeed it then.

BilalWalker commented 3 weeks ago

Can I work on this issue? Also a bit more help if possible would be appreciated.