mpusz / mp-units

The quantities and units library for C++
https://mpusz.github.io/mp-units/
MIT License
1.07k stars 85 forks source link

Modulo operations #448

Closed mpusz closed 1 year ago

mpusz commented 1 year ago

What is your gut feeling about modulo operation for quantities? Which of the following should be valid, and what should be the result (value and unit):

  1. 7m % 2m
  2. 7km % 2000m
  3. 1300m % 1km
  4. 7m % 2
  5. 7km % 2000
  6. 1300m % 1*10^3
  7. 1km % 1%

Which of them would you provide in a physical units library?

FYI, std::chrono::duration supports all of them.

JohelEGP commented 1 year ago

The result type of 1-3 should be the same as operator/. Then, it can be read as

As for the rest, considering #412, I'm not sure if it makes sense to support all of them. But if they are to be supported, they should behave like the built-in operator (https://eel.is/c++draft/expr.mul#4). Let x be a quantity and y a dimensionless quantity. If x / y works, (x / y) * y + x % y should equal x (in value, not in type).

i-ky commented 1 year ago

If we are talking about x = y * n + z, where n is an integer (and has therefore dimension "one"), then [x] = [y] = [z]. So [x / y] must also have dimension "one" and [x % y] must have dimension [x]. Anything beyond that (4-7) will not make sense from dimensional analysis standpoint. 1-3 are OK, but probably won't worth the effort if you are not planning to provide % 2pi rad for angles. For me it would also make sense to consistently treat all these cases with fmod() and remainder().

JohelEGP commented 1 year ago

You're right. Now I remember that std::chrono's operator%s return a duration [1] [2].

mpusz commented 1 year ago

Thanks, taking into account https://eel.is/c++draft/expr.mul#4, I think that we need to support both versions (like std::chrono::duration does) as both of them are correct in value and type for this equation:

  1. 7m / 3m * 3m + 7m % 3m == 7m
  2. 7m / 3 * 3 + 7m % 3 == 7m The result of the modulo operation in both cases should be a quantity of the same dimension as the dividend (not necessarily of the same unit).
mpusz commented 1 year ago

What about the unit of the dimensionless divisor. Should it be constrained to have a unit of:

  1. one
  2. any ratio (i.e. percent, or 3/5)
  3. any integral ratio (i.e. 10^3)
mpusz commented 1 year ago

I just realized that the below provided by @JohelEGP are wrong:

2km % 1999m == 1m, "2 km modulo 1999 m is 1 m", and 1001m % 1km == 1m, "1001 m modulo 1 km is 1 m".

Again to satisfy https://eel.is/c++draft/expr.mul#4:

Does it have any sense? 😕

JohelEGP commented 1 year ago

Aren't they converted to the common unit before the operation? Just like operator/, and all others. 2km % 1999m performs 2000m % 1999m that equals 1m. 1001m % 1km performs 1001m % 1000m that equals 1m.

mpusz commented 1 year ago

Multiplication and division never converted to a common unit. The ratio (sometimes really huge) is stored in a type. Also, what is a common unit for 40 * km / (1 * h)? Common unit is used only in addition and subtraction operators.

JohelEGP commented 1 year ago

I shouldn't have referred to operator/, besides / for the equivalence explained at https://eel.is/c++draft/expr.mul#4. operator% behaves much like operator+, actually.

mpusz commented 1 year ago

Exactly, but the equation https://eel.is/c++draft/expr.mul#4 should be preserved, right? Any ideas on how to make it correct and avoid surprises?

JohelEGP commented 1 year ago

Where do we stand with the original examples?

mpusz commented 1 year ago

We can implement anything 😉

Modulo arithmetic, as you described in https://github.com/mpusz/units/issues/448#issuecomment-1496268966, makes sense but is inconsistent with the division, which means https://eel.is/c++draft/expr.mul#4 will not hold.

https://github.com/mpusz/units/issues/448#issuecomment-1497107194 is consistent with division and will make https://eel.is/c++draft/expr.mul#4 happy, but such results may be really confusing to the users.

mpusz commented 1 year ago

This is what we have on the mainline compared to chrono: https://godbolt.org/z/7q9o8dWr4.

mpusz commented 1 year ago

I also tried on Python's Pint:

>>> from pint import UnitRegistry
>>> ureg = UnitRegistry()
>>> print(4 * ureg.h / (100 * ureg.min))
0.04 hour / minute

They are doing the same as we do (but decay to floating-point).

mpusz commented 1 year ago

We could consider using a common unit if we are dealing with the division of the same quantities and integer types, but it would :

mpusz commented 1 year ago

Any ideas on how to proceed?

JohelEGP commented 1 year ago

Right. mp_units' operator/ is more general.

std::chrono only ever deals with dimensionless quantities of unit one.

https://godbolt.org/z/66Gvcs954

Any ideas on how to proceed?

For starters, there seems to be something wrong with operator%. Its unit is always that of the lhs, rather than a common unit. https://github.com/mpusz/units/blob/b40c7b2fcd8ebb45204ce610ba7b96fb83b80e2d/src/core/include/units/quantity.h#L495-L505

mpusz commented 1 year ago

For starters, there seems to be something wrong with operator%. Its unit is always that of the lhs, rather than a common unit.

Yes, I noticed that and started to investigate. And the more I thought about that, the more I was confused about what is the expected behavior. Let's agree on some reasonable behavior and I will implement it.

JohelEGP commented 1 year ago

It seems you applied https://github.com/mpusz/units/issues/231#issuecomment-838077002, which goes against:

If x / y works, (x / y) * y + x % y should equal x (in value, not in type). -- https://github.com/mpusz/units/issues/448#issuecomment-1496268966

Although operator% isn't the one in position to know that. When you do z + x % y, the lhs of +, z, may not be (x / y) * y, so % shouldn't dictate the return type expecting that to happen.

mpusz commented 1 year ago

Right, but it would be good to have a consistent behavior between division and its reminder for integral types.

JohelEGP commented 1 year ago

Right now, it's wrong: https://godbolt.org/z/66Gvcs954.

5h % 120min = 5 h
300min % 2h = 0 min
300min % 120min = 60 min

They should all output 60 min.

mpusz commented 1 year ago

So, should we implement modulo in terms of the common unit and ignore the fact that the division will work otherwise?

chiphogg commented 1 year ago

Really interesting question!

First, I can say what Au does. We support % for integral Rep inputs only. We convert both inputs to their common unit, and this is the unit we use for the result. Note that this implies we only support % for inputs of the same dimension. I wrote up a couple quick tests to confirm this --- one with dimensionless denominator, and one with some other unit --- and verified that yep, we don't allow this.

I had never actually considered a case such as 7m % 3 before. I can see how in some ways it makes sense, mathematically. However, on reflection, I think it takes us into territory similar to #432. If it's nonsense to ask about "the ceiling of my height", then it's probably also nonsense to ask about "the remainder of this length divided by 3" (or any other raw integer). We want to encourage users to think of 7m as, in many ways, a notional abstract quantity, but the result of length % 3 depends on the units: you get 1m if it's expressed in meters, and 1cm if it's expressed in cm, which is a very different result.

All of which is to say: I now think that I accidentally made the right choice in forbidding this operation. :grin:


I think the main worry with this approach is that it's inconsistent with 7.6.5. However, I think this is fine. IMO, that part of the standard only applies to raw numeric types. Quantities are different in a variety of ways, and break all kinds of otherwise reasonable assumptions (e.g., T * T -> T). I think we can just add this to the list.


Another worry is that the defining relationship of division won't hold exactly. That relationship is:

(n / d) * d + (n % d) == n

It will produce the correct quantity --- so, the == here will return true --- but it will often be expressed in different units.

I think it would be nice if it reproduced the units as well, but I don't see a pathway to get there with simple rules, and complicated rules would really worry me. So I think this is actually fine as well.


I guess I shouldn't neglect the original quiz. :slightly_smiling_face:

  1. 7m % 2m -> 1m
  2. 7km % 2000m -> 1000m
  3. 1300m % 1km -> 300m
  4. 7m % 2 FORBID
  5. 7km % 2000 FORBID
  6. 1300m % 1*10^3 FORBID
  7. 1km % 1% FORBID
chiphogg commented 1 year ago

Right now, it's wrong: https://godbolt.org/z/66Gvcs954.

5h % 120min = 5 h
300min % 2h = 0 min
300min % 120min = 60 min

They should all output 60 min.

I just cooked up this test (in Au). It passes! :grin:

TEST(Modulo, ReturnsResultInCommonUnits) {
    EXPECT_THAT(hours(5) % minutes(120), SameTypeAndValue(minutes(60)));
    EXPECT_THAT(minutes(300) % hours(2), SameTypeAndValue(minutes(60)));
    EXPECT_THAT(minutes(300) % minutes(120), SameTypeAndValue(minutes(60)));
}
JohelEGP commented 1 year ago

I think the main worry with this approach is that it's inconsistent with 7.6.5. However, I think this is fine. IMO, that part of the standard only applies to raw numeric types. Quantities are different in a variety of ways, and break all kinds of otherwise reasonable assumptions (e.g., T * T -> T). I think we can just add this to the list.

Another worry is that the defining relationship of division won't hold exactly. That relationship is:

(n / d) * d + (n % d) == n

It will produce the correct quantity --- so, the == here will return true --- but it will often be expressed in different units.

I agree with this. It should be sufficient that it holds when n and d are quantities of the same dimension, regardless of the units of operands or results.

mpusz commented 1 year ago

Another worry is that the defining relationship of division won't hold exactly. That relationship is:

(n / d) * d + (n % d) == n

It will produce the correct quantity --- so, the == here will return true --- but it will often be expressed in different units.

I am not sure how it should work if the below are true?

5h / 120min = 0 h/min
5h % 120min = 60 min
JohelEGP commented 1 year ago

Seems it's because it returns a quantity of number 5 / 120, with the references of the input quantities divided, which divides their magnitudes. That is different from the old behavior of bringing them to their common unit before dividing.

mpusz commented 1 year ago

The library never brought two units into a common unit before dividing. In general, for quantities of different dimensions it is impossible to do so. The same is done in Pint and JSR 385 (Java library). See the list of drawbacks here: https://github.com/mpusz/units/issues/448#issuecomment-1497583180.

mpusz commented 1 year ago

Also, see the https://github.com/mpusz/units/issues/231 discusion.

JohelEGP commented 1 year ago

I think that's what it effectively did in 0.6.0: https://godbolt.org/z/c5hjo1hf8.

5h % 120min = 60 min
300min % 2h = 60 min
300min % 120min = 60 min

https://github.com/mpusz/units/blob/3f1eb80aaf090a95e84b67648521b8402c0db53f/src/include/units/quantity.h#L281-L292

mpusz commented 1 year ago

Yes, in 0.6.0 the modulo worked as we agreed above, but the division was always truncating the value and never used a common unit before division.

JohelEGP commented 1 year ago

Ah, you're

Right. mp_units' operator/ is more general.

operator% could be limited to quantities of the same unit.

chiphogg commented 1 year ago

For what it's worth, I don't think we need to restrict to the same unit. If we have auto angle = degrees(5678);, I think it's nice to be able to write angle % revolutions(1).

chiphogg commented 1 year ago

Another worry is that the defining relationship of division won't hold exactly. That relationship is:

(n / d) * d + (n % d) == n

It will produce the correct quantity --- so, the == here will return true --- but it will often be expressed in different units.

I am not sure how it should work if the below are true?

5h / 120min = 0 h/min
5h % 120min = 60 min

I think you're right: I goofed here. If the numerator has a larger unit than the denominator --- and thus, a smaller value --- then (n / d) can truncate a nonzero-in-principle result to zero, and * d can't recover the lost information.

That's a really nicely chosen example.

mpusz commented 1 year ago

Yes, this is why the modulo operator was changed in 0.7.0 to complement such a "broken" division according to https://github.com/mpusz/units/issues/231#issuecomment-838077002.

However, as we agreed, the current solution is surprising to the users, to say the least. I could provide the rule like "if both operands are integral and of the same dimension then a division of them should use a common unit" but I am not sure if that is a good idea?

chiphogg commented 1 year ago

However, as we agreed, the current solution is surprising to the users, to say the least. I could provide the rule like "if both operands are integral and of the same dimension then a division of them should use a common unit" but I am not sure if that is a good idea?

I'd be very skeptical of a special-cased rule like that. For multiplication and division generally, users can reason independently about the unit, and the result. So, for 5h / 120min, the value should be 5 / 120, and the unit should be h / min.

I currently think the crux of the issue is this. It is crystal clear how we should handle division (in isolation). And it's crystal clear how we should handle modulus (in isolation). The problem is that when we combine those operations, they don't cooperate the way we expect.

It's true that https://github.com/mpusz/units/issues/231#issuecomment-838077002 draws a compelling line in the sand: with integral rep, the quotient-remainder theorem must hold. The best idea I currently have for making that possible is to provide a separate function for quotient/remainder decomposition. This gives you an API boundary which can be a "trigger point" for that common unit conversion. You might call it like this:

const auto [q, r] = quotient_and_remainder(5h, 120min);

If we did this, I would expect q to be 2, and r to be 60min. Now, granted, q * d + r won't be the same type as n. But if we're going to let users mix units --- a situation that couldn't even occur in the use cases typically contemplated by the Q-R theorem --- then I think that's a reasonable property to sacrifice.

And by using a separate function, we wouldn't interfere with the standard use cases for division or modulus, when either of them is used independently of the other.

i-ky commented 1 year ago

You might call it like this:

const auto [q, r] = quotient_and_remainder(5h, 120min);

I've already mentioned standard function names (fmod() and remainder()) in https://github.com/mpusz/units/issues/448#issuecomment-1496358990 that implement the same thing for float and double. Let's stick to them.

mpusz commented 1 year ago

I've already mentioned standard function names (fmod() and remainder()) in https://github.com/mpusz/units/issues/448#issuecomment-1496358990 that implement the same thing for float and double. Let's stick to them.

Should we use fmod() for integrals, or maybe an overload of mod() for both representation types should be provided?

JohelEGP commented 1 year ago

And it's crystal clear how we should handle modulus (in isolation).

How would you define it? I created a concept for it, and defined it like the C++ standard, in terms of /. It works for integral types and std::chrono::duration. But it seems like it never worked for mp_units.

1680722716 1680723270 1680723301

[P1673]'s 11.6 describes problems with number concepts. I suspect most problems go away if the concepts separate the set an operation is conceptually carried on vs. the mapping of the operation's value to the C++ type. Similar to how 1 / 4 is carried out in the set of real numbers, but 0.25 has its fraction part discarded (https://eel.is/c++draft/expr.mul#4.sentence-3) before being mapped to int as 0.

chiphogg commented 1 year ago

And it's crystal clear how we should handle modulus (in isolation).

How would you define it? I created a concept for it, and defined it like the C++ standard, in terms of /. It works for integral types and std::chrono::duration. But it seems like it never worked for mp_units.

Yeah... sorry I glossed over this. But thanks for prompting me to expand!

What I meant here is that modulus is a sensible "pure-quantity" operation: it doesn't depend on the units.

Let's momentarily restrict to positive inputs, just for simplicity. Then we could define a % b to mean "keep subtracting b from a until the latter is smaller than b, and take this final value of a". The resulting quantity is perfectly well defined, no matter what units we choose to express it.

This "subtraction-based" viewpoint also explains why the result is expressed in the common unit of the inputs. (Of course, we wouldn't actually implement it by repeated subtraction, but I think it's a useful way to define it.)

chiphogg commented 1 year ago

You might call it like this:

const auto [q, r] = quotient_and_remainder(5h, 120min);

I've already mentioned standard function names (fmod() and remainder()) in #448 (comment) that implement the same thing for float and double. Let's stick to them.

I like your dimensional analysis in that comment, and I think your conclusions about which cases are valid (and which ones aren't) are spot-on.

As to the naming, though, I don't think fmod and remainder are relevant here, because those are different functions than the one I'm proposing. However, I did a little digging, and I did find one that was much more like this one: std::div(). However, I think this has a design flaw. It was written before structured bindings, so even though the following line would work, you don't actually know which is the quotient and which is the remainder!

const auto [q, r] = div(5h, 120min);

I think that argues in favor of a name like quotient_and_remainder, with a correspondingly structured return type, because it can be robustly used with structured bindings.

chiphogg commented 1 year ago

Should we use fmod() for integrals, or maybe an overload of mod() for both representation types should be provided?

My understanding of the situation on the ground is that fmod is always used for floating point types (never integrals), and % is always used for integral types (never floating point). If we use these names, we should keep it that way.

I could, however, see a case for adding a function with a new name that could work with both floating point types and integral types. It strikes me as unobjectionable, although I wouldn't bother adding it unless people ask for it.