Closed jbusecke closed 4 years ago
But I suspect that something triggers the computation of the array, when assigning a coordinate.
This sounds correct to me.
I'm a little surprised about a significant performance between these two ways of adding coordinates. The implementation of these methods does differ slightly, but they should be pretty similar.
Could you think of a way I would be able to diagnose this further? Sorry for these wide questions but I am not very familiar with these xarray internals.
I believe this was fixed in a recent version. Closing
It may be. It would be good to check if you have the time.
I think this issue was actually a dupe. I remember you pointing me to changes in 14.x, that improved the performance, but I cant find the other issue right now. I will have an opportunity to test this in the coming days on some huge GFDL data
I can confirm that this issue is resolved for my project. Seems to not make a difference in speed anymore whether I assign the dataarray as coordinate or data variable. Thanks for the fix!
Great! thanks for checking.
I am trying to reconstruct vertical cell depth from a z-star ocean model. This involves a few operations involving both dimensions and coordinates of a dataset like this:
The problematic step is when I assign the calculated dask.arrays to the original dataset. This happens in a function like this.
This takes very long compared to a version where I assign the values as data variables:
I am not able to reproduce this problem in a smaller example yet and realize that my example is quite complex (e.g. has functions that are not shown). But I suspect that something triggers the computation of the array, when assigning a coordinate.
I have profiled my more complex code involving this function and it seems like there is a substantial increase in calls to
{method 'acquire' of '_thread.lock' objects}
.Profile output of the first version (assigning coordinates)
For the second version (assigning data variables)
Does anyone have a feel for why this could happen or how I could refine my testing to get to the bottom of this?
Output of
xr.show_versions()