rust-itertools / itertools

Extra iterator adaptors, iterator methods, free functions, and macros.
https://docs.rs/itertools/
Apache License 2.0
2.72k stars 309 forks source link

`try_len` method #723

Closed Easyoakland closed 1 year ago

Easyoakland commented 1 year ago

Add method to get the length of an iterator if the size_hint is consistent. Reasons why this would be useful:

  1. It makes it more obvious when getting the length with it.try_len().unwrap() than with the match statement.
  2. Iterator adapters that increase the length of iteration are not supposed to implement ExactSizeIter as indicated here. This could be used in those cases.
Philippe-Cholet commented 1 year ago

Reading this, I remembered the size_hint() doc saying it should be fully trusted (bold below is mine):

size_hint() is primarily intended to be used for optimizations such as reserving space for the elements of the iterator, but must not be trusted to e.g., omit bounds checks in unsafe code. An incorrect implementation of size_hint() should not lead to memory safety violations.

So maybe the documentation of try_len should warn about an incorrect implementation of size_hint.

And to have a similar signatures to other functions in size_hint.rs (and not have &impl Iterator), I would do

fn try_len(sh: SizeHint) -> Option<usize> { ... }

then

size_hint::try_len(self.size_hint())
Easyoakland commented 1 year ago

@Philippe-Cholet

So maybe the documentation of try_len should warn about an incorrect implementation of size_hint. And to have a similar signatures to other functions in size_hint.rs (and not have &impl Iterator), I would do ...

Fixed in 952d8c4. If a length is needed for unsafe operations I believe one is supposed to bound by https://doc.rust-lang.org/std/iter/trait.TrustedLen.html

phimuemue commented 1 year ago

Hi there. I see this might be handy in some circumstances.

Easyoakland commented 1 year ago

Let's return Result<usize, (usize, Option)> so that the callers do not need to call size_hint a second time in the failure case.

Sure. I suppose it would save a re-computation if its useful in the error case.

Please inline size_hint::try_len.

Either you meant Itertools::try_len() or I've already done that. If you meant the former that's now done as well. How does one determine when to inline a function? I've seen the #[inline] annotation used before but I haven't been able to figure out when to apply it. I've generally just used it when similar functions nearby also have it or if benchmarks show an improvement.

I am unsure about the name. I guess that most users looking for something like this start out by searching for size_hint. Maybe unique_size_hint?

I chose this name to mirror the naming as in ExactSizeIterator and here. The primary use case is for getting the length from chaining/adapting two iterators known to not overflow a usize or a specialization-like behavior on ExactSizeIterator in a function implementation.

In the docs, I did not directly understand what "relies on" means. Maybe "inherits guarantees and restrictions from size_hint"?

Sure.

Do we need the Sized trait bound?

Good catch. Originally it was needed because it incorrectly took impl Iterator. Now that it takes the sizeHint we don't need the bound.

phimuemue commented 1 year ago

thank you.

Either you meant Itertools::try_len() or I've already done that.

I meant to literally eliminate size_hint::try_len by doing the computation directly in Itertools::try_len. As it stands, it only creates an indirection without real need.

Regarding ‘[inline]‘: you’re right, there’s no clear pattern in the code base, so let’s just not use it here.

Easyoakland commented 1 year ago

I meant to literally eliminate size_hint::try_len by doing the computation directly in Itertools::try_len. As it stands, it only creates an indirection without real need.

Regarding ‘[inline]‘: you’re right, there’s no clear pattern in the code base, so let’s just not use it here.

fixed in 068de35

phimuemue commented 1 year ago

Thanks.

bors r+

bors[bot] commented 1 year ago

Build succeeded!

The publicly hosted instance of bors-ng is deprecated and will go away soon.

If you want to self-host your own instance, instructions are here. For more help, visit the forum.

If you want to switch to GitHub's built-in merge queue, visit their help page.