rust-lang / libs-team

The home of the library team
Apache License 2.0
123 stars 19 forks source link

Add `is_multiple_of` #404

Closed folkertdev closed 2 months ago

folkertdev commented 3 months ago

Proposal

Problem statement

Provide a convenient and panic-free method to check whether one numeric value is a multiple of another.

Motivating examples or use cases

Checking for "is x a multiple of y" is usually done with the expression x % y == 0. However, there are two subtle cases in which this expression does not work as intended:

These cases are unlikely to occur in practice, because determining whether a value is a multiple of 0 or -1 is kind of silly. Nonetheless, the potential panic because of division by zero or overflow is always there.

I'd also argue that x % y == 0 isn't intuitive. It's a pattern we learn as programmers but is actually kind of hard to explain to newcomers.

Though I've wanted this functionality before, it came up just now in a PR for miri. Miri's codebase is (rightfully!) averse to operations that can panic. However I know that because the right-hand side in my case is not 0 or -1, the panic is actually unreachable. Nonetheless, I have to pollute my code with unwraps.

Solution sketch

A version of this for all the {i, u}* types

fn is_multiple_of(lhs: i64, rhs: i64) -> bool {
    match rhs {
        // prevent division by zero
        0 => lhs == 0,
        // prevent overflow when lhs == i64::MIN
        -1 => true, 
        _ => lhs % rhs == 0,
    }
}

There might be some slight variations that produce better assembly. I think the name aligns well with next_multiple_of.

In some real code:

// before
if cpusetsize.checked_rem(chunk_size).unwrap() == 0 { // can this panic? who knows!
    ...
}

// after
if cpusetsize.is_multiple_of(chunk_size) { // no panics here
    ...
}

Alternatives

Links and related work

What happens now?

This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

Second, if there's a concrete solution:

scottmcm commented 3 months ago

Given that we have https://doc.rust-lang.org/std/primitive.u32.html#method.next_multiple_of, having is_multiple_of certainly seems reasonable, like how we have next_power_of_two and is_power_of_two.

next_multiple_of panics for rhs:0, and I wonder what exactly should happen for that here. What's 0.is_multiple_of(0), for example? I might say it should panic anyway, because calling it with an rhs of zero seems like a bug.

folkertdev commented 3 months ago

To me one of the upsides of the function as currently proposed is that it just never panics. While the next multiple of zero does not make sense, 0 is absolutely a multiple of 0.

joshtriplett commented 2 months ago

I would love to see this.

@folkertdev Do you actually need this for signed types, or would your use cases be covered if this just supported unsigned types (like div_ceil and next_multiple_of)?

folkertdev commented 2 months ago

@joshtriplett I believe I've personally only ever wanted/used this function for unsigned types, but also I think the behavior of is_multiple_of for signed types only has one implementation that makes sense (as opposed to e.g. (-4).next_multiple_of(4) which could be 0 or -8)

So at the moment I see no reason not to just include signed types in one fell swoop, but if that is easier/faster then they could be split up of course.

joshtriplett commented 2 months ago

Mostly asking for bikeshed avoidance reasons: If we know that unsigned is the primary use case, when we go to discuss this if we have any disagreements over the signed versions we can focus on consensus on the unsigned versions.

scottmcm commented 2 months ago

as currently proposed is that it just never panics

That's not necessarily better, if it makes the post-condition less clear.

Can you show more about the situations in which it's used, and what people are doing based on the answer this gives?

folkertdev commented 2 months ago

To me, the postcondition is very clear: the function answers whether for its input lhs and rhs there exists an integer c such that lhs == c * rhs. The case where rhs == 0 follows naturally from the math.

I'll agree that it is weird, in that with the standard lhs % rhs == 0 pattern the case rhs == 0 is nonsensical, but in my opinion that is because using % for performing the is_multiple_of check is clunky.


So I guess the question is whether there are cases where the current lhs % rhs == 0 is used, where the panic on rhs == 0 prevents bugs (ignoring that panics are DOS vectors in the sort of code that I usually write). Below are some examples from the rust/rust codebase found with grepping for % ..... == 0.

Sometimes, a check for whether the rhs is zero is already required for correct logic. In those cases, the change in panicking behavior has no impact on the result of the program

    // If src type size is an integer multiple of the destination type size then
    // the caller will have calculated a `dst_cap` that is an integer multiple of
    // `src_cap` without remainder.
    if const {
        let src_sz = mem::size_of::<SRC>();
        let dest_sz = mem::size_of::<DEST>();
        // would become dest_sz != 0 && src_sz.is_multiple_of(dest_sz)
        dest_sz != 0 && src_sz % dest_sz == 0
    } {
        return false;
    }

Other times, the panic is actually a bug: here i believe the behavior is correct when using i.is_multiple_of(group_size). The hidden panic might be a bug here (or is just unreachable in practice).

    for (i, ch) in s.chars().filter(|&ch| ch != '_').rev().enumerate() {
        // would become: i > 0 && i.is_multiple_of(group_size)
        if i > 0 && i % group_size == 0 {
            chars.push('_');
        }
        chars.push(ch);
    }

In other cases I think the bug would be obvious even without the panic, or in any case just as obvious as any typo in the constant values.

                    // would become: self.index.is_multiple_of(CHUNK_BITS)
                    if self.index % CHUNK_BITS == 0 {
                        break;
                    }
    const CHUNK_SIZE: usize = 192;

    // Check the properties of `CHUNK_SIZE` and `UNROLL_INNER` that are required
    // for correctness.
    const _: () = assert!(CHUNK_SIZE < 256);
    // would become: assert!(CHUNK_SIZE.is_multiple_of(UNROLL_INNER));
    const _: () = assert!(CHUNK_SIZE % UNROLL_INNER == 0);

in summary, I think the change in behavior is unlikely to come up, and even more unlikely to cause issues in practice, and actually quite likely to eliminate hidden bugs in current code.

shepmaster commented 2 months ago

I actually tried to use if foo.is_multiple_of(bar) just yesterday because this idea wormed into my brain 😉

As a semi-silly, semi-serious idea, you could have a fn is_multiple_of_alt<const N: TheIntType>(self) -> bool and then be in the same problematic bucket as methods like slice::as_chunks.

bluebear94 commented 2 months ago

For more prior art, Raku has the %% operator, but it fails for 0 %% 0.

joshtriplett commented 2 months ago

Option 1: We approve this for unsigned types only, and dodge the question. Odds are, most people won't actually care about the missing methods.

Option 2: We implement this for signed types as well, with the semantic @folkertdev suggests.

These seem like the two most reasonable options for us to decide between. I'm fine with either of these options, with a mild preference for the former.

Amanieu commented 2 months ago

We discussed this in the @rust-lang/libs-api meeting and would like to add this only for unsigned integers for now. The signed variant can be deferred until there is a convincing use case.