Open jorisvandenbossche opened 5 days ago
take
take
@jorisvandenbossche - I'm having a hard time trying to replicate np.array(ser, copy=False)
to return an error using pandas latest release (2.2) or 2.3.x branch with Numpy v2.0 or Numpy>v2.0. I'm trying to replicate it to use it for debugging, Thanks
Try this example with latest main:
In [1]: ser = pd.Series(["a", "b"], dtype="category")
In [2]: np.array(ser, copy=False)
...
ValueError: Unable to avoid copy while creating an array as requested.
You need to use a dtype that cannot be converted zero-copy to numpy, such as category I used above (if you would use integers, for example, that will not error).
And also you need latest main
(or 2.3.x
), this is not yet included in a released version.
Can't seem to replicate on my end, on both main
and 2.3.x
. Running:
import numpy as np
import pandas as pd
print(f"NumPy version: {np.__version__}")
print(f"Pandas version: {pd.__version__}")
ser = pd.Series(["a", "b"], dtype="category")
x = np.array(ser, copy=False)
print(x)
Output:
(pandas-dev) kev@mac pandas % python test.py
+ /Users/kev/.pyenv/versions/3.10.14/bin/ninja
[1/1] Generating write_version_file with a custom command
NumPy version: 1.26.4
Pandas version: 3.0.0.dev0+1580.g68d9dcab5b
['a' 'b']
I'm running on macOS 15.1.1
Ah, you need numpy > 2.0
Thanks, I'm now able to replicate it on my end. Working on the PR 🔧
Great!
@jorisvandenbossche - Does this warning message look good to you?
Numpy>=2.0 changed copy keyword's behavior, making copy=False raise an error when a zero-copy numpy array is not possible
The numpy 2.0 changed the behavior of the
copy
keyword in__array__
, and especially makingcopy=False
to be strict (raising an error when a zero-copy numpy array is not possible). We only adjusted pandas to update thecopy
handling now in https://github.com/pandas-dev/pandas/pull/60046 (issue https://github.com/pandas-dev/pandas/issues/57739).But that also introduced a breaking change for anyone doing
np.array(ser, copy=False)
(and who hasn't updated that when updating to numpy 2.0), which historically has always worked fine and could silently give a copy anyway.The idea would be to still include a FutureWarning about this first before raising the error (as now in main) in pandas 3.0.
See https://github.com/pandas-dev/pandas/pull/60046#issuecomment-2457749926 for more context