pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.55k stars 1.06k forks source link

Needs performance check / improvements in value assignment of DataArray #1771

Open fujiisoup opened 6 years ago

fujiisoup commented 6 years ago

https://github.com/pydata/xarray/blob/5e801894886b2060efa8b28798780a91561a29fd/xarray/core/dataarray.py#L482-L489

In #1746, we added a validation in xr.DataArray.__setitem__ whether the coordinates consistency of array, key, and values are checked. In the current implementation, we call xr.DataArray.__getitem__ to use the existing coordinate validation logic, but it does unnecessary indexing and it may decrease the __setitem__ performance if the arrray is multidimensional.

We may need to optimize the logic here.

Is it reasonable to constantly monitor the performance of basic operations, such as Dataset construction, alignment, indexing, and assignment? (or are these operations too light to make a performance monitor?)

cc @jhamman @shoyer

jhamman commented 6 years ago

@fujiisoup in #1457, we added a framework (Airspeed-velocity) for benchmarking xarray operations. It is certainly within the scope of that framework to include indexing performance benchmarks. I just implemented a few IO related benchmarks with the expectation that more issues, like this one, would be added later on.