jmillikin commented 9 months ago

Proposal

Problem statement

core::iter::Sum is implemented for types that can be created by summing a sequence of values. It's implemented for various integer types, and also the core::num::Wrapping wrapper type (which forces wrapped arithmetic). I propose that it should also be implemented for core::num::Saturating, which would force saturating arithmetic.

Per discussion (below), I'm only proposing to cover unsigned integer types. Summing over signed integers doesn't have an obviously correct behavior, and I don't want to get tangled up in that in my quest for saturating unsigned summation.

Motivating examples or use cases

I sometimes want to sum a set of integers that might cumulatively exceed some bound, without the risk of overflow (wrapping and crashing would both be bad).

For example, when assembling a record with max size isize::MAX from a set of smaller fields, I want to sum all the usize field sizes and then if summed_size > (isize::MAX as usize) { return Err(...); }. On 32-bit targets (e.g. WASM) this summation can easily overflow, so code that does field_sizes().iter().sum() is riskier than it looks.

Solution sketch

Adjust the integer_sum_product!() macro to add Saturating<T>, following the existing example of Wrapping<T> . This macro implements both iter::Sum and iter::Product for the relevant types.

For example, the macro-expanded impl of iter::Sum ends up looking like this:

impl core::iter::Sum for Saturating<$t> {
    fn sum<I: Iterator<Item = $t>>(iter: I) -> Self {
        iter.fold(Saturating(0), |a, b| a + b)  // a + b == a.saturating_add(b)
    }
}

Alternatives

The only alternative is "do nothing" -- both of the types are in the standard library, so users can't implement the glue trait themselves.

Links and related work

What happens now?

This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

We think this problem seems worth solving, and the standard library might be the right place to solve it.
We think that this probably doesn't belong in the standard library.

Second, if there's a concrete solution:

We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.

shepmaster commented 9 months ago

the macro-expanded impl of iter::Sum ends up looking like this

Should it short-circuit?

pitaj commented 9 months ago

Should it short-circuit?

Good idea. Given saturation should usually be rare, it may not be worth the extra check. Unless there is no extra check because the saturation is implemented by a branch anyways.

scottmcm commented 9 months ago

Hmm, for signed it can't short-circuit, right? Dunno if that means it shouldn't for unsigned either.

shepmaster commented 9 months ago

Hmm, for signed it can't short-circuit

what should u8::MAX + 2 - 1 yield?

pitaj commented 9 months ago

Hmm yeah how should signed saturating sum be implemented? Saturating in from an unbounded integer type feels more intuitive to me.

jmillikin commented 9 months ago

I wouldn't mind implementing it only for unsigned types.

My intuition is that [i8::MAX, 2, -1].iter().sum() == Saturating(i8::MAX - 1), but if that isn't universal then I'd rather implement only unsigned saturation.

the8472 commented 9 months ago

Promising exact short-circuiting may inhibit optimizations because we have to stop taking items the moment it saturates. Without or with sloppy short-circuiting we can take items in batches.

jmillikin commented 9 months ago

Regarding short-circuiting, I think having iter.sum() consume different amounts of items depending on the output type would be surprising to most users. There's no other iter::Sum implementations that behave like that, right?

The existing short-circuiting impls (Option and Result) require the inputs to also be of those types, which raises the idea of having two impls:

Iterator::<Item = u32>::sum::<Saturating<u32>>() will consume the entire iterator, without short-circuiting.
Iterator::<Item = Saturating<u32>>::sum::<Saturating<u32>>() will consume up to the entire iterator, returning immediately after reaching saturation ("exact short-circuiting").

the8472 commented 9 months ago

That leaves the fastest option on the table.

jmillikin commented 9 months ago

That leaves the fastest option on the table.

Yep, that's true. An method on Iterator that might consume more or fewer items depending on internal compiler optimizations has too many sharp edges for me to be comfortable with, given the wide variety of impl Iterator types.

fn sum_and_next<I: Iterator<Item = u32>>(iter: &mut I) {
    let sum: Saturating<u32> = iter.sum();
    println!("sum: {sum:?}");

    // What does this print? Can the output change depending on compiler
    // optimization level? Use of gccrs / rustc_codegen_gcc ?
    println!("next: {:?}", iter.next());
}

There's probably room somewhere for a function shaped like (for example) &[u32] -> Saturating<u32> if people need to sum up some numbers really really fast and don't need to worry about iterator item counting.

the8472 commented 9 months ago

It wouldn't be a compiler optimization, it would be in the library. E.g. we could call next_chunk to get a bunch of items and have that autovectorized.

jmillikin commented 9 months ago

It wouldn't be a compiler optimization, it would be in the library. E.g. we could call next_chunk to get a bunch of items and have that autovectorized.

Sorry, there must be something I'm missing, because the API you describe seems very difficult to use correctly.

Given each of the following arrays:

[1, u32::MAX, 2, 3, 4, ...]
[1, 1, u32::MAX, 2, 3, 4, ...]
[1, 1, 1, u32::MAX, 2, 3, 4, ...]
[1, 1, 1, 1, u32::MAX, 2, 3, 4, ...]

I would expect the number of items consumed to be identical for all of them -- either it shouldn't short circuit and print None, or it should short-circuit and print Some(2).

The idea of an implementation of Sum that leaves the underlying iterator in an indeterminate state is not appealing.

the8472 commented 9 months ago

I would expect the number of items consumed to be identical for all of them

And what is that expectation derived from? If the documentation doesn't say whether it's short-circuiting or not then where in the API contract did we make any promises? The fact that u32 and Option<u32> process differently already means that behavior is type-dependent and not a fixed property of sum(). Which means if we introduce a new impl then you wouldn't know its behavior anyway and we can choose a new one.

because the API you describe seems very difficult to use correctly.

Pass the iterator by value, don't try to do things with the leftover state is quite easy. Most iterator uses are consuming anyway, not by &mut.

jmillikin commented 9 months ago

I would expect the number of items consumed to be identical for all of them

And what is that expectation derived from? If the documentation doesn't say whether it's short-circuiting or not then where in the API contract did we make any promises?

I understand that the contract for iter.sum::<Saturating<T>>() could leave the state of the iterator undefined after a short-circuiting saturating if we choose to write the API docs in that way.

My argument is that we should not do that. I would not propose an ACP that added such a hard-to-use API, nor would I implement it, nor use it if someone else implemented it.

The fact that u32 and Option<u32> process differently already means that behavior is type-dependent and not a fixed property of sum(). Which means if we introduce a new impl then you wouldn't know its behavior anyway and we can choose a new one.

But both of those behave predictably. Option<u32> will not continue reading past the first None it encounters.

Pass the iterator by value, don't try to do things with the leftover state is quite easy. Most iterator uses are consuming anyway, not by &mut.

Standard library functions without an unsafe should not have unspecified behavior under normal operating conditions.

the8472 commented 9 months ago

I would not propose an ACP that added such a hard-to-use API

I am confused what's so hard to use about it. You pass in an iterator. You get out a saturated result and it's as fast as a hand-optimized loop over a slice.

Standard library functions without an unsafe should not have unspecified behavior under normal operating conditions.

That already isn't true for fairly central std features like HashMap iteration order or float results. Or the order in which predicates are evaluated for sorting. There are plenty more unspecified implementation details.

jmillikin commented 9 months ago

I am confused what's so hard to use about it. You pass in an iterator. You get out a saturated result and it's as fast as a hand-optimized loop over a slice.

I would expect auto-vectorization and batching in a function like u32::saturating_sum_slice(xs: &[u32]) -> u32, since the number of items read from the input isn't observable.

For Iterator::sum(), having the number of items consumed depend on both the content and (invisible) chunk size means that I have to be extra-careful around using it, which defeats the purpose of having it in the first place (vs copy-pasting a helper function).

the8472 commented 9 months ago

Exhaustion behavior already is not universal. Some iterators short-circuit, others don't. So you need to read the documentation for each thing you use. sum doesn't make a guarantee in either direction and we already established that it's type-dependent. So you already have to be "careful" if you're trying to look at the exhaustion state. Which ime is not that common. Consuming the iterator is way more common. If you're writing generic code over Sum that depends on exhaustion behaviror then your code is already wrong and you weren't careful enough. ~Arguably your code would be wrong anyway because we're not guaranteeing short-circuiting for Option, so relying on that would also be wrong.~

So I don't understand how that adds an "extra" to the careful.

You really just shouldn't be looking at the exhaustion state, even without this ACP.

jmillikin commented 9 months ago

I'll leave the discussion about whether consuming items from an Iterator should be predictable, because it doesn't seem like it will be possible for you and me to reach consensus on that topic.

Regarding this:

Arguably your code would be wrong anyway because we're not guaranteeing short-circuiting for Option, so relying on that would also be wrong.

Note that sum for Option does in fact guarantee exact short-circuiting: https://doc.rust-lang.org/std/option/enum.Option.html#method.sum

Takes each element in the Iterator: if it is a None, no further elements are taken, and the None is returned. Should no None occur, the sum of all elements is returned.

the8472 commented 9 months ago

Ah, that didn't show up on the Sum trait's doc page.

boomshroom commented 3 months ago

Iterator::sum takes self by value. It shouldn't be possible for the caller to tell if every element was consumed or not without side effects happening from the iteration. If you actually rely on every element getting evaluated, then sum doesn't seem like the right function and you might be better off with an explicit fold where you can add explicit side effects to the summation.

jmillikin commented 3 months ago

Iterator::sum takes self by value. It shouldn't be possible for the caller to tell if every element was consumed or not without side effects happening from the iteration. If you actually rely on every element getting evaluated, then sum doesn't seem like the right function and you might be better off with an explicit fold where you can add explicit side effects to the summation.

As discussed in this thread, the side effects from iteration are observable.

fn main() {
    let xs: &[Option<u32>] = &[Some(1), Some(2), Some(3), None, Some(5), Some(6)];
    let mut i = xs.iter();
    let s: Option<u32> = (&mut i).map(|x| *x).sum();
    let extra: Vec<&Option<u32>> = i.collect();

    println!("{extra:?}");
}

The options for behavior are:

Do not short-circuit -- consume the entire iterator, even if saturation is reached. This enables batching of additions, but might perform more addition operations than necessary.
Exact short circuiting --sum() must stop iterating after saturation is reached. This provides predictable and deterministic behavior, but inhibits batching of additions.
- The existing behavior of sum() for Option<T> is exact short-circuiting.
Ambiguous short circuiting -- sum() may stop iterating after saturation is reached, but is not required to, and the number of items consumed is unspecified. This would be the most efficient implementation, at the cost of potentially non-deterministic behavior (varying between compilation modes, platforms, etc).

I'm comfortable with either (1) or (2). I would not want (3) to be the default behavior in the standard library's sum() because it would be extremely surprising for such a basic operation to have unspecified behavior.

JustusFluegel commented 3 months ago

fwiw I personally am most comfortable with (1) - I wouldn't expect a change from i64 to Saturating<i64> to change the behavior of my code, other than that all numeric operations saturate.

JustusFluegel commented 3 months ago

My intuition is that [i8::MAX, 2, -1].iter().sum() == Saturating(i8::MAX - 1), but if that isn't universal then I'd rather implement only unsigned saturation.

That would be my intuition as well - might be worth to make a forum post and get some more input on how others would intuitively think about this?

the8472 commented 3 months ago

(1) and (3) are the options that enable vectorization in general. (2) would only allow it in special cases where an implementation is certain that no side-effects exist and the iterator could be rewound.

(3) would result in the highest performance.

For option (1) an implementation could at least try to exhaust the iterator more cheaply by calling advance_by(usize::MAX) (repeatedly if necessary) when it has been saturated. Though it'll require some balancing to make that branch not too expensive in the case where it doesn't saturate. And of course it would only help for adapters where advance_by actually is faster.

boomshroom commented 3 months ago

Actually, I think saturation short circuiting is less surprising than Option short circuiting, because the former only affects the internal state of the iterator after the sum, which shouldn't be relied upon anyways, while the latter has a reasonable non-short circuiting implementation that would change the direct output. That implementation being to treat None values as the identity element (0 for sum, 1 for product). Either implementation however is trivial to implement manually, either precede the sum call with map_while(id) for short circuiting, or filter_map(id) for non short circuiting.

rust-lang / libs-team

[ACP] Impl `iter::Sum` (and `iter::Product`) for `num::Saturating<u*>` #303

Proposal

Problem statement

Motivating examples or use cases

Solution sketch

Alternatives

Links and related work

What happens now?

Possible responses