rust-scraper / ego-tree

Vec-backed ID-tree
https://docs.rs/ego-tree
ISC License
45 stars 16 forks source link

Provide consuming movement methods so that NodeMut can act as a cursor. #33

Closed adamreichold closed 1 month ago

adamreichold commented 1 month ago

Still a draft because I added only the into_next_sibling method instead of handling all axes until we agreed that this is a helpful approach. Added the missing methods after trying out a bit deduplication of internals.

LoZack19 commented 1 month ago

This is perfect. As I said, into_* functions are very much needed in NodeMut, I like this approach

cfvescovo commented 1 month ago

This is an excellent approach! Huge thanks @adamreichold, your contributions are invaluable!

LoZack19 commented 1 month ago

Do you think that we could also take advantage of this to implement mutable and owned iterators for NodeMut?

adamreichold commented 1 month ago

Do you think that we could also take advantage of this to implement mutable and owned iterators for NodeMut?

You mean something like

impl<'a, T: 'a> NodeMut<'a, T> {
  pub fn children(&self) -> ChildrenMut<'a, T>;
}

which yields distinct instances of NodeMut<'a, T> as items?

If so, then I don't think these methods help to implement this safely as while these methods can "advance" a NodeMut through the children of a given node, I cannot hand out that current NodeMut by returning it from fn next(&mut self) -> Option<NodeMut<'a, T>>; because I then have no basis to compute the next child node when next is called again.

This would only work with a lending iterator which yields a &'b mut NodeMut<'a, T> borrowing from &'b self which coincidentally is exactly the pattern that std's CursorMut::next uses.

adamreichold commented 1 month ago

Realising how important it is to hold on to an existing mutable node reference, I added a second commit which extends the signature of these methods to

fn into_axis<F>(mut self, f: F) -> Result<Self, Self>;

return the current reference as Err(self) if navigation is not possible so that information is never lost if the calling code wants to recover from such a situation.

LoZack19 commented 1 month ago

If so, then I don't think these methods help to implement this safely as while these methods can "advance" a NodeMut through the children of a given node, I cannot hand out that current NodeMut by returning it from fn next(&mut self) -> Option<NodeMut<'a, T>>; because I then have no basis to compute the next child node when next is called again.

This is true. NodeMut needs to remain inside the state of the iterator in order to compute the next one. This excludes the prospect of an owned iterator. It could still be useful, though, to have a mutable iterator which follows the CursorMut::next pattern, as you suggested.

adamreichold commented 1 month ago

It could still be useful, though, to have a mutable iterator which follows the CursorMut::next pattern, as you suggested.

Note that this would not be an iterator though, i.e. not an implementation of the Iterator trait, because that trait cannot support lending (items can borrow from the container, but not from the iterator itself).

NodeMut basically is our equivalent to CursorMut, and our equivalent to CursorMut::next is

node = match node.into_next_sibling() {
  Ok(node) => node,
  Err(node) => node,
};

We could of course provide methods like

pub fn to_next_sibling(&mut self) -> Result<(), ()>;

which modify self in-place (or not if movement is not possible) instead of moving out of self. We could then implement the into_* methods on top of those.

Would you prefer that? If so, just the to_* variants or both to_* and into_*?

cfvescovo commented 1 month ago

Realising how important it is to hold on to an existing mutable node reference, I added a second commit which extends the signature of these methods to

fn into_axis<F>(mut self, f: F) -> Result<Self, Self>;

return the current reference as Err(self) if navigation is not possible so that information is never lost if the calling code wants to recover from such a situation.

This is a good idea which would immensely benefit from better documentation

cfvescovo commented 1 month ago

It could still be useful, though, to have a mutable iterator which follows the CursorMut::next pattern, as you suggested.

Agreed

adamreichold commented 1 month ago

This is a good idea which would immensely benefit from better documentation

Will amend the documentation if we decide to stick to the into_* methods instead of switching over to to_*/in-place modification.

LoZack19 commented 1 month ago

Would you prefer that? If so, just the to* variants or both to and into_?

Personally I would love to have both implementations. I'm just a bit concerned about code duplication. I think that, to avoid overcomplicating the NodeMut API, it is convenient to abandon in-place modification through the to_* functions and stick to into_*

When chosing, I prefer the alternative which hides the least state possible.

LoZack19 commented 1 month ago

If you all agree, I approve to merge this PR into master

adamreichold commented 1 month ago

When chosing, I prefer the alternative which hides the least state possible.

While trying, I also realized that the implement of to_axis is requires additional unsafe code to massage the lifetimes which another argument against it. So I guess we should indeed stick to the into_* methods and I will amend the doc comments w.r.t. the Result<Self, Self> pattern.

adamreichold commented 1 month ago

If you all agree, I approve to merge this PR into master

I amended the docs as requested, so from my point of view, this is good to go.

LoZack19 commented 1 month ago

Totally agree. Let's finish documenting this code and merge. I just want to point out that I have created a readme in which I try to summarize the design choises made in creating ego-tree, so possibly if there are choices to point out in this PR I would suggest doing so in the readme.md

cfvescovo commented 1 month ago

Totally agree. Let's finish documenting this code and merge. I just want to point out that I have created a readme in which I try to summarize the design choises made in creating ego-tree, so possibly if there are choices to point out in this PR I would suggest doing so in the readme.md

@adamreichold rebase your branch on top of master

cfvescovo commented 1 month ago

I have rechecked, everything LGTM

adamreichold commented 1 month ago

Thanks. I would be glad if we could make a point release (for the FusedIterator impls) so that I could eventually recover the dependent FusedIterator impls in scraper.

cfvescovo commented 1 month ago

Releasing 0.8.0 ASAP