pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.83k stars 18k forks source link

DEPR: deprecate / warn about raising an error in __array__ when copy=False cannot be honore #60340

Open jorisvandenbossche opened 5 days ago

jorisvandenbossche commented 5 days ago

The numpy 2.0 changed the behavior of the copy keyword in __array__, and especially making copy=False to be strict (raising an error when a zero-copy numpy array is not possible). We only adjusted pandas to update the copy handling now in https://github.com/pandas-dev/pandas/pull/60046 (issue https://github.com/pandas-dev/pandas/issues/57739).

But that also introduced a breaking change for anyone doing np.array(ser, copy=False) (and who hasn't updated that when updating to numpy 2.0), which historically has always worked fine and could silently give a copy anyway.

The idea would be to still include a FutureWarning about this first before raising the error (as now in main) in pandas 3.0.

See https://github.com/pandas-dev/pandas/pull/60046#issuecomment-2457749926 for more context

KevsterAmp commented 3 days ago

take

KevsterAmp commented 3 days ago

take

KevsterAmp commented 1 day ago

@jorisvandenbossche - I'm having a hard time trying to replicate np.array(ser, copy=False) to return an error using pandas latest release (2.2) or 2.3.x branch with Numpy v2.0 or Numpy>v2.0. I'm trying to replicate it to use it for debugging, Thanks

jorisvandenbossche commented 1 day ago

Try this example with latest main:

In [1]: ser = pd.Series(["a", "b"], dtype="category")

In [2]: np.array(ser, copy=False)
...
ValueError: Unable to avoid copy while creating an array as requested.

You need to use a dtype that cannot be converted zero-copy to numpy, such as category I used above (if you would use integers, for example, that will not error).

And also you need latest main (or 2.3.x), this is not yet included in a released version.

KevsterAmp commented 1 day ago

Can't seem to replicate on my end, on both main and 2.3.x. Running:

import numpy as np
import pandas as pd

print(f"NumPy version: {np.__version__}")
print(f"Pandas version: {pd.__version__}")

ser = pd.Series(["a", "b"], dtype="category")
x = np.array(ser, copy=False)
print(x)

Output:

(pandas-dev) kev@mac pandas % python test.py
+ /Users/kev/.pyenv/versions/3.10.14/bin/ninja
[1/1] Generating write_version_file with a custom command
NumPy version: 1.26.4
Pandas version: 3.0.0.dev0+1580.g68d9dcab5b
['a' 'b']

I'm running on macOS 15.1.1

jorisvandenbossche commented 1 day ago

Ah, you need numpy > 2.0

KevsterAmp commented 3 hours ago

Thanks, I'm now able to replicate it on my end. Working on the PR 🔧

jorisvandenbossche commented 3 hours ago

Great!

KevsterAmp commented 2 hours ago

@jorisvandenbossche - Does this warning message look good to you?

Numpy>=2.0 changed copy keyword's behavior, making copy=False raise an error when a zero-copy numpy array is not possible