numpy / numpy

The fundamental package for scientific computing with Python.
https://numpy.org
Other
28.17k stars 10.16k forks source link

BUG: astype changes the shape of structured arrays #24313

Open dentalfloss1 opened 1 year ago

dentalfloss1 commented 1 year ago

Describe the issue:

As of numpy 1.25, the following deprecation warning appears when setting the dtype of an array that will overflow:

/home/sarah/dep.py:6: DeprecationWarning: NumPy will stop allowing conversion of out-of-bound Python integers to integer arrays. The conversion of 248 to int8 will fail in the future. For the old behavior, usually: np.array(value).astype(dtype) will give the desired result (the cast overflows). arr = np.array(dat, dtype=mydtype)

However, if the dtype is a multipart structured array, the suggested fix duplicates all the values, thus changing the shape of the array, as can be seen in the example code below.

Reproduce the code example:

import numpy as np 
mydtype = [('re','i1'),('im','i1')]

dat = [(2,158),(6,856),(248,35)]

arr = np.array(dat, dtype=mydtype)
# works but is deprecated

arr2 = np.array(dat).astype(mydtype)
print(arr,arr2)

print("Is arr equal to arr2 ? ", np.array_equal(arr,arr2))

Error message:

sarah@sarah-optiplex-ubuntu:~$ python dep.py
/home/sarah/dep.py:6: DeprecationWarning: NumPy will stop allowing conversion of out-of-bound Python integers to integer arrays.  The conversion of 158 to int8 will fail in the future.
For the old behavior, usually:
    np.array(value).astype(dtype)
will give the desired result (the cast overflows).
  arr = np.array(dat, dtype=mydtype)
/home/sarah/dep.py:6: DeprecationWarning: NumPy will stop allowing conversion of out-of-bound Python integers to integer arrays.  The conversion of 856 to int8 will fail in the future.
For the old behavior, usually:
    np.array(value).astype(dtype)
will give the desired result (the cast overflows).
  arr = np.array(dat, dtype=mydtype)
/home/sarah/dep.py:6: DeprecationWarning: NumPy will stop allowing conversion of out-of-bound Python integers to integer arrays.  The conversion of 248 to int8 will fail in the future.
For the old behavior, usually:
    np.array(value).astype(dtype)
will give the desired result (the cast overflows).
  arr = np.array(dat, dtype=mydtype)
[( 2, -98) ( 6,  88) (-8,  35)] [[(  2,   2) (-98, -98)]
 [(  6,   6) ( 88,  88)]
 [( -8,  -8) ( 35,  35)]]
Is arr equal to arr2 ?  False 

Runtime information:

import sys, numpy; print(numpy.version); print(sys.version) : 1.25.1 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0]

print(numpy.show_runtime()) :

WARNING: threadpoolctl not found in system! Install it by pip install threadpoolctl. Once installed, try np.show_runtime again for more detailed build information [{'numpy_version': '1.25.1', 'python': '3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0]', 'uname': uname_result(system='Linux', node='sarah-optiplex-ubuntu', release='5.15.0-78-generic', version='#85-Ubuntu SMP Fri Jul 7 15:25:09 UTC 2023', machine='x86_64')}, {'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'], 'found': ['SSSE3', 'SSE41', 'POPCNT', 'SSE42', 'AVX', 'F16C', 'FMA3', 'AVX2'], 'not_found': ['AVX512F', 'AVX512CD', 'AVX512_KNL', 'AVX512_KNM', 'AVX512_SKX', 'AVX512_CLX', 'AVX512_CNL', 'AVX512_ICL']}}] None

Context for the issue:

Working around this bug requires multiple difficult-to-read lines of code, when it should by working similarly to how the deprecated code worked.

seberg commented 1 year ago

In this case you will have to provide a large enough integer to the structured dtype, i.e. go via arr.astype("i8,i8").astype("i1,i1") to use the astype trick.