Open coastalwhite opened 2 months ago
I wasn't at the issue triage, but this is not the right approach.
pl.from_numpy(array)
should not change the properties of array
. Just like df.to_numpy()
should not change the properties of df
.
We should add the parameter combo allow_copy
/ writable
or some other API to allow users control whether or not a copy is made. But changing the array
properties is a no-go in my opinion.
I strongly disagree here. If we set the flag and it is a problem for you, you got an error. Which is good, because you have a bug in your query. Data mutating on a distance. It causes all kinds of problems. Our statistics are then wrong and you get nonsense results.
The solution: copy the data on input. Then we don't set a flag. There is no valid reason you want to keep a write flag to data you send zero copy into another program.
So setting the flag will only cause problems if there are problems. Catching problems is good.
BTW: we can restore the flag once Polars doesn't memory map the data anymore.
If that type of memory safety is an issue, we can detect writable inputs and either raise ("set to non-writable first") or make a copy, with some parameter(s) to control this behavior.
Setting flags on user inputs and then resetting them later at a later point in the program are side effects that are going to lead to hard-to-detect bugs. That's spooky action at a distance that we really shouldn't want.
That means we always have to copy or always raise on input on a numpy array as numpy arrays are typically always writeable. I want first class numpy support and zero-copy without users requiring to set flags.
Setting a flag from writeable to readably goes from a less constraint condition to a more constraint condition. The only side effect is is that you cannot write for the duration that Polars holds the lock. It is a logical mutex. If you do you get an error raised, not invalid data, not invalid results.
In a general case, I agree with you, but you cannot write to data you borrow and I think this a case where we have to be practical.
Description
Following discussion in the Issue Triage. We would make numpy arrays immutable if they are passed as zero-copy to Polars.
An example can be seen here: https://stackoverflow.com/a/5541452