Open Illviljan opened 3 days ago
FAILED xarray/tests/test_treenode.py::TestSetNodes::test_set_child_node - TypeError: _set() got an unexpected keyword argument 'copy'
FAILED xarray/tests/test_treenode.py::TestSetNodes::test_set_grandchild - TypeError: _set() got an unexpected keyword argument 'copy'
FAILED xarray/tests/test_treenode.py::TestSetNodes::test_create_intermediate_child - TypeError: _set() got an unexpected keyword argument 'copy'
FAILED xarray/tests/test_treenode.py::TestSetNodes::test_overwrite_child - TypeError: _set() got an unexpected keyword argument 'copy'
FAILED xarray/tests/test_treenode.py::TestPruning::test_del_child - TypeError: _set() got an unexpected keyword argument 'copy'
Is treenode used anywhere else besides as a mixin-class for DataTree? DataTree overrides TreeNode._set anyway.
FAILED xarray/tests/test_datatree.py::TestSetItem::test_grafted_subtree_retains_name - AssertionError: assert 'new_subtree_name' == 'original_subtree_name'
- original_subtree_name
? ----- ^^
+ new_subtree_name
? ^^
This test is a mutability test and a typo caused this error. This test used root["new_subtree_name"] = subtree # noqa
which uses _set_item under the hood and should use copy=True by default.
Elsewhere in Xarray (e.g., on Dataset.copy()
), the copy
keyword argument refers specifically to whether copying array data is copied, and metadata fields are always copied. This seems potentially inconsistent with that?
Maybe _fastpath
is better?
Maybe
_fastpath
is better?
👍 for a private constructor, if we want to use this internally inside Xarray.
Well it wouldn't be for internal xarray usage for me, it would be for a backend/mfdatatree function. But it is a typical way we use to speed things up and implying checks are being bypassed, do we have any other public examples?
My real problem is pretty similar to the intro example but I have 100+ datatrees instead, such a simple operation takes 1.5 minutes because of all the copies, it goes down to around 19ms with this pr.
My real problem is pretty similar to the intro example but I have 100+ datatrees instead, such a simple operation takes 1.5 minutes because of all the copies, it goes down to around 19ms with this pr.
Yikes, that is pretty bad!
Do you have a slightly more realistic benchmark you could share, and/or add to the asv tests?
I am sure we can do much better for performance in from_dict
, but there are a few related issues here that I would like get right first, including a major refactor to DataTree internals (https://github.com/pydata/xarray/pull/9063) and changing the DataTree constructor (https://github.com/pydata/xarray/issues/9196).
Once we have the right behavior, it makes sense to start working on performance. I am optimistic that we can figure out how to improve performance without needing to expand the public API.
Significantly speed up
from_dict
by removing unwanted copying.whats-new.rst