Open LukasKalbertodt opened 4 years ago
Should ExactSize
panic if the underlying iterator fails to yield exactly size
values? Should SizeHint
panic if the underlying iterator fails to yield at least lower
values?
I recall suggesting this to @scottmcm, who was not a fan, in a private setting at one point. cc also @jswrenn re. https://github.com/rust-itertools/itertools/pull/341.
Should
ExactSize
panic if the underlying iterator fails to yield exactlysize
values? ShouldSizeHint
panic if the underlying iterator fails to yield at leastlower
values?
I initially would have said: no, they should not panic. len()
says it has the same safety guarantees as size_hint
and size_hint
stresses that it's basically just for optimization and nothing bad should happen if the returned size is incorrect.
But thinking about it again... maybe checking this might not actually incur any notable performance overhead and be useful. Lets consider the implementation for ExactSize
:
struct ExactSize<I> {
inner: I,
size: usize,
}
impl<I: Iterator> Iterator for ExactSize<I> {
type Item = I::Item;
fn next(&mut self) -> Option<Self::Item> {
match self.inner.next() {
None => None,
Some(x) => {
self.size -= 1;
Some(x)
}
}
}
fn size_hint(&self) -> (usize, Option<usize>) {
(self.size, Some(self.size))
}
}
This implementation does not contain those checks. Well, in debug mode, the integer underflow is checked. Which is good, as an underflow would usually result in "no you can't allocate a gazillion GB". This would be next()
with those checks:
match self.inner.next() {
None => {
assert_eq!(self.size, 0);
None
}
Some(x) => {
self.size = self.size.checked_sub(1).unwrap();
Some(x)
}
}
The None
case is a cold path, so the additional check there probably doesn't matter. The checked_sub
in the Some
branch might be fine, too.
Well, this is certainly something we should benchmark and test before stabilization.
For maximal optimization you'd need some sort of unsafe fn set_trusted_len()
that provides a TrustedLen wrapper.
size_hint
is pretty situational, and it turns out that it's half-unused anyway. And while .with_size_hint(low, high)
is of course safe, it's also really easy to get into a garbage-in-garbage-out situation with because an iterator doesn't need to work if the size_hint is wrong -- which is easy to happen if you do something like "the predicate in my filter is almost always true so I'll just hint that it's the same size as the input". Additionally, SizeHint<I>
cannot propagate TrustedLen
, so something like (0..100).with_exact_size(100)
looks like you're being helpful but actually makes things worse.
So I've been more and more thinking that what's really needed is a different iterator API that's allowed to be wrong, but where being wrong just over- or under-allocates, rather than permitting incorrect behaviour. Then things like ResultShunt can say "you should allocate enough for everything", even though it might return fewer things if there's an Err
. And if it's just one number, it could just be usize::saturating_add
ed for Chain
or whatever, rather than the current more-complicated dance. It has a useful default implementation of self.size_hint().0
as well, so we could move collect
to using it immediately without regressing third-party iterators.
(I also still like .collect_into(Vec::with_capacity(100))
, or similar, from https://github.com/rust-lang/rust/issues/45840.)
Usually iterator adaptors do a fine job at keeping information about the length of an iterator. But once you throw something like
filter
in your iterator chain, the lower bound is 0, meaning that collecting into a collection can't properly pre-allocate. Also, some iterator adaptors (likechain
) cannot implementedExactSizeIterator
although in many real world cases, the size is perfectly known.Currently, working around those limitations is often tedious and requires quite a bit of additional code. I think it would be a lot nicer to just add a
.with_exact_size(27)
somewhere into your chain. So I'd like to propose adding the following toIterator
:fn with_size_hint(self, lower: usize, upper: Option<usize>) -> SizeHint
SizeHint
would implementIterator::size_hint
with the given values, forwarding all other methods to the inner iterator (and implementing the same traits)fn with_exact_size(self, size: usize) -> ExactSize
ExactSize
would implementExactSizeIterator
with the given value and overrideIterator::size_hint
. All other methods are forwarded to the inner iterator (and implementing the same traits, plusExactSizeIterator
)I would have created this as a PR, but I'm short on time and wanted to hear some opinions first.