Closed scottstanie closed 1 year ago
As a temporary workout for my case, I'm just going to do
In [3]: pf = (da.copy(True)).polyfit("z", deg=2, w=np.arange(nz))
In [5]: da.max(), da.mean()
Out[5]:
(<xarray.DataArray ()>
array(0.99878237),
<xarray.DataArray ()>
array(0.50869358))
The thing I don't understand is that _to_temp_dataset
seems to be trying to do a deep copy (based on the argument names)
But it is acting like a shallow copy:
In [6]: pf = (da.copy(False)).polyfit("z", deg=2, w=np.arange(nz))
In [7]: da.max(), da.mean()
Out[7]:
(<xarray.DataArray ()>
array(8.92217147),
<xarray.DataArray ()>
array(2.29014397))
yikes thanks for the nice report!
Hi! I was about to open a new issue about this, but looks like it's a known issue and there's a stale PR... Let me know if I can help to get this fixed!
You can always reach out to the creator of the original PR and comment in the PR.
But it looks like this particular PR was reaching a dead end and should be completely rewritten. But anyway the reviewers left helpful remarks on how to proceed.
Do you know where the in-place modification is happening? We could just copy there and fix this particular issue.
Do you know where the in-place modification is happening? We could just copy there and fix this particular issue.
Not sure, but I'll take a look!
What happened:
After running
da.polyfit
on a DataArray with weights, the data has been overwrittenWhat you expected to happen:
I didn't see this documented anywhere, but I did not expect that creating a polyfit dataset would clobber the original data that I'm fitting to. The data isn't altered in the case of unweighted fitting, only weighted.
Minimal Complete Verifiable Example:
Anything else we need to know?:
I assume it's happening here https://github.com/pydata/xarray/blob/da99a5664df4f5013c2f6b0e758394bec5e0bc80/xarray/core/dataset.py#L6805
My question is whether this is supposed to be the case to avoid copies? Or if it's accidental?
Environment:
Output of xr.show_versions()
xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.8.6 | packaged by conda-forge | (default, Oct 7 2020, 18:42:56) [Clang 10.0.1 ] python-bits: 64 OS: Darwin OS-release: 18.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.19.0 pandas: 1.1.2 numpy: 1.20.2 scipy: 1.5.2 netCDF4: 1.5.4 pydap: None h5netcdf: None h5py: 3.2.1 Nio: None zarr: 2.6.1 cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.2.1 cfgrib: None iris: None bottleneck: 1.3.2 dask: 2.14.0 distributed: 2.20.0 matplotlib: 3.3.0 cartopy: 0.18.0 seaborn: 0.10.1 numbagg: None pint: 0.16.1 setuptools: 49.6.0.post20200814 pip: 21.1.2 conda: 4.8.4 pytest: 6.2.4 IPython: 7.18.1 sphinx: 3.5.1