How to name a number accessor in `quantity`?

mpusz commented 1 year ago

In the discussion in #476, it was suggested that the library should only contain unit-safe interfaces. That is an excellent recommendation. The only problem we have is with naming two alternative interfaces of quantity.

The first interface (possibly) converts the underlying number to a required unit:

static_assert(quantity<isq::length[km]>(2. * km).number_in(m) == 2000.);

This is why this interface always returns a value (not a reference to underlying data).

The second interface should return such a reference so users do not have to pay for a copy (if the representation type is expensive to copy) or can even change the underlying storage with external means (see https://github.com/mpusz/mp-units/issues/476#issuecomment-1660447474 and https://github.com/mpusz/mp-units/issues/476#issue-1830314533) for rationale. To make it unit safe it also should take the unit as an argument. As this function never changes underlying storage such an interface should be available only if the requested unit is equivalent (does not have to be exactly the same) to the current one.

static_assert(quantity<isq::pressure[Pa]>(20 * Pa).raw_number_in(N / m2) == 20);

raw_number_in is just a placeholder name above. Do you have a better idea of how to name such two interfaces so everyone understands what they do?

mpusz commented 1 year ago

.number_ref_in()?

chiphogg commented 1 year ago

The reason I picked the "data" word for Au's .data_in() was by inspiration from std::string::data and std::array::data. So, I think using the "data" word is idiomatic with respect to the rest of the standard library. That said, I'm certainly not married to it, and I'm open to alternative ideas. For example, I think the "ref" word in .number_ref_in() does a better job conveying the lvalue-ness than the "data" word does.

I don't have any new ideas myself, but I'll be interested to see what people come up with!

JohelEGP commented 1 year ago

Adding ref feels like it does the job.

As @mpusz mentioned at https://github.com/mpusz/mp-units/issues/476#issuecomment-1660455244, the C++ standard library has many "unsafe" observers.

The data for containers only returns part of the value, as you need size to be able to traverse the data. Except for std::string, which is null-terminated. That's why I'm against #423. A quantity value is already a value, so quantity::value returning the number/numerical quantity value feels wrong.

mpusz commented 1 year ago

So, I think using the "data" word is idiomatic with respect to the rest of the standard library.

data() returns a contiguous sequence of elements and it comes in pair with size(). This is why data() is not a good fit here.

mpusz commented 1 year ago

I just implemented quantity::value_ref_in(U). However, before I commit, I would like to share one potential issue that we might want to discuss.

It seems that often we see the following pattern value_cast<To>(q).value_ref_in(To). This is repetitive, and I am not the biggest fan of it. Please note that q.value_in(To) is allowed only for value-preserving conversions so this will not be a direct replacement of the above pattern.

JohelEGP commented 1 year ago

value_cast<To>(q).value_ref_in(To)

That wouldn't work if value_ref_in was &-qualified.

mpusz commented 1 year ago

Yes, but why make it in the way it does not work? Do you suggest that we should also worry about the lifetime issues here?

What initially bothered me was the mandatory repetition of the same unit. Are we OK with that?

JohelEGP commented 1 year ago

I think that expression is potentially bogus. You're getting a reference to a data member of a temporary.

mpusz commented 1 year ago

Yes, but this is what we have done for many years now in the C++ Standard Library. For example https://en.cppreference.com/w/cpp/utility/optional/value.

mpusz commented 1 year ago

Of course, considering your use case with passing the reference to an underlying API for updating the wrapped value, I see your point. In such case, &-qualification will make it illegal, but it will make obtaining a number of a quantity with a proper unit even harder because now someone will have to write:

const auto temp = value_cast<si::metre / si::second>(q)
std::court << temp.value_ref_in(si::metre / si::second);

That is a lot of typing to ensure that the unit is correct, and still, we have this repetition that I meant initially.

Previously, it was enough to type:

std::cout << value_cast<si::metre / si::second>(q).value();

JohelEGP commented 1 year ago

If you really need an non-const lvalue reference to the underlying number, your input type would already be in its correct type.

Are you sure you shouldn't be using value_in instead of value_ref_in? This issue is about naming value_in, not about the reference accessor (although it'd be appropriately renamed, too). Is that right?

mpusz commented 1 year ago

To make sure that everyone is on the same page, here are the important conversions we have in the library for similar cases:

q.in(Unit) -> quantity performs a value-preserving conversion of a quantity to a new unit
value_cast<Unit>(q) -> quantity forces a unit conversion (even truncating) to a new unit
q.value_in(Unit) -> Representation performs a value-preserving conversion of a value to a required unit, potentially involving runtime operations of rescaling the value (thus, a copy is being returned)
q.value_ref_in(Unit) -> Representation & -> has no runtime overhead, never changes an underlying value, returns a reference

mpusz commented 1 year ago

Are you sure you shouldn't be using value_in instead of value_ref_in?

I did one copy in the value_cast already and I do not need to pay for another one in value_in (or maybe I really need a reference - like in your API-related case).

mpusz commented 1 year ago

This issue is about naming value_in, not about the reference accessor (although it'd be appropriately renamed, too).

I actually think otherwise. This issue is how we should name the accessor (so the reference is implied) member function and not the one that potentially provides a scaled value that is not stored inside the current quantity.

JohelEGP commented 1 year ago

Oh, right, this issue is about value_ref_in. I got confused along the way.

mpusz commented 1 year ago

I pushed the changes to #484 so everyone can see how it changes the existing code. Let's review that and decide if this is what we want. Maybe there is a way to improve it even more?

chiphogg commented 1 year ago

To make sure that everyone is on the same page, here are the important conversions we have in the library for similar cases:

q.in(Unit) -> quantity performs a value-preserving conversion of a quantity to a new unit

value_cast<Unit>(q) -> quantity forces a unit conversion (even truncating) to a new unit

q.value_in(Unit) -> Representation performs a value-preserving conversion of a value to a required unit, potentially involving runtime operations of rescaling the value (thus, a copy is being returned)

q.value_ref_in(Unit) -> Representation & -> has no runtime overhead, never changes an underlying value, returns a reference

This list is really helpful. I think one thing missing from here is a way to "force" or "coerce" a meaningful but potentially lossy conversion, and get the value out, all in one step.

Using Au as a reference point, and assuming the new APIs coming in aurora-opensource/au#122, I thought it might be illustrative to make a few tables. This will give us a systematic accounting of the different operations. Note that I'm explicitly using Au's likely future APIs. In particular, here are the differences from what we actually have today:

q.coerce_as and q.coerce_in haven't been added yet (but are likely coming this week).
q.in<T>(u) and q.as<T>(u) are currently forcing/"coercing", but sometime in the medium-term future we may change that.

The reason I'm using future APIs should become clear when we see the tables. These semantics would be more logically consistent and easier to learn, because they'd form a composable grammar, where vocabulary words such as "as", "in", and "coerce" have consistent meaning.

Now for the tables.

Comparison tables

Returning a quantity:

Forcing?	Rep	Au	mp-units
No	Same	`q.as(u)`	`q.in(u)`
No	Explicit `T`	`q.as<T>(u)` (someday?)	??
Yes	Same	`q.coerce_as(u)`	`value_cast<u>(q)`
Yes	Explicit `T`	`q.coerce_as<T>(u)`	??

Note that in reality, q.as<T>(u) currently lives in row 4, not row 2. This will be true for a long time as it's a breaking change.

Returning a raw number:

Forcing?	Rep	Au	mp-units
No	Same	`q.in(u)`	`q.value_in(u)`
No	Explicit `T`	`q.in<T>(u)` (someday?)	??
Yes	Same	`q.coerce_in(u)`	`value_cast<u>(q).value_in(u)` or `value_cast<u>(q).value_ref_in(u)`
Yes	Explicit `T`	`q.coerce_in<T>(u)`	??

Note that in reality, q.in<T>(u) currently lives in row 4, not row 2. This will be true for a long time as it's a breaking change.

Returning a raw number reference:

Au	mp-units
`q.data_in(u)`	`q.value_ref_in(u)`

Takeaways / discussion

Maybe mp-units should consider explicit-rep versions. For example, q.in<T>(u) could "fill in" row 2 in the first table ("Returning a quantity"). The advantage (besides clarity) is that we can perform the unit conversion in the widest rep involved if we know the destination unit and rep at the same time. This could be a low-hanging fruit.
The root cause of the repetition is the lack of consistent vocabulary. In Au, we go from the non-forcing to forcing versions by adding the word "coerce". In mp-units, in the first table, going from non-forcing to forcing changes everything: free function instead of member function, and totally new words "value" and "cast" instead of "in". I think "cast" is the closest candidate to a vocabulary word for "forcing". But when we go down to the second table, I'm really at a loss as to how to apply it there!

I don't know what set of APIs I would propose for mp-units, or even whether I would say that we should definitely strive for APIs that use vocabulary words consistently. But I found that writing out these tables helped me understand why this was easy for Au but currently hard for mp-units.

mpusz commented 1 year ago

We can add q.in<T>(u) and q.numerical_value_in<T>(u) for safe conversions, and value_cast<U, T>(q) for non-safe ones if they turn up to be useful. I just didn't want to provide too many functions at the start without a good reason for them. We can always add things, but it is harder to remove them (even if we decide they are not useful, someone probably used them already somewhere).

mpusz commented 1 year ago

I agree that we lack the functionality to force a cast and get a value in one step. And I also do not have a good idea how to do it. Before the change in #484 it was straightforward value_cast<u>(q).numeric_value(), but if we decide to remove this function and replace it with the one that takes a unit, things go sideways, as I stated in the PR. Also, in many cases I refactored, we want to get the value in the current unit of a quantity without knowing what the actual unit is. Such an API makes it much harder, and we end up with something like: q.numerical_value_ref_in(q.unit). There is also this issue: https://github.com/mpusz/mp-units/pull/484#issuecomment-1694751404.

mpusz commented 1 year ago

Regarding Au's approach, I am afraid that it would not find an agreement in the ISO C++ Committee room. The C++ community is used to see xxx_cast() non-member function to represent an unsafe operation. I also find Au's q.in(u) too cute for something that should be used with caution and, in fact, at best avoided 😉

mpusz commented 1 year ago

Please remember that we also have a quantity_cast<QS>(q) to force an unsafe conversion between quantity types.

chiphogg commented 1 year ago

Also, in many cases I refactored, we want to get the value in the current unit of a quantity without knowing what the actual unit is. Such an API makes it much harder, and we end up with something like: q.numerical_value_ref_in(q.unit).

I find this very surprising: I think wanting the "value in the current unit" without caring what the unit is should be rare, and should be treated as code smell. There are legitimate use cases, but I think every such case should be examined carefully.

A good rule of thumb in my experience is that any line of code that gets the underlying value without caring about the unit should occur right next to code that handles the unit. (For example, when you print a quantity, you get the underlying value, but then you also print the unit symbol.)

I also wonder what fraction of these "don't care about the unit" use cases are obviated by zero.

Regarding Au's approach, I am afraid that it would not find an agreement in the ISO C++ Committee room. The C++ community is used to see xxx_cast() non-member function to represent an unsafe operation. I also find Au's q.in(u) too cute for something that should be used with caution and, in fact, at best avoided 😉

Yep, that's another reason that I'm thankful not to have the burden of striving for the C++ standard! (That is, not in Au --- I like helping out with mp-units which does have that burden. :slightly_smiling_face:) I think the composable vocabulary of in, as, and coerce gives us a concise and learnable interface.

It's true in some sense that (Au's) q.in(u) should be used with caution: we'd rather people stay within the quantity types, and we try to provide all the operations they could want to help them do so. However, what should be used with far more caution is something like q.numeric_value(), as mentioned above! The conciseness of in buys us some room to compensate for its un-conciseness: namely, that it forces users to name the unit at the callsite. I'm really happy with this tradeoff.

I do want to mention before I forget, though, that I think mp-units' names are more fundamentally correct. Expressing a quantity "in" some units still gives a quantity, not a raw number! Au's use of "in" bends the vocabulary --- it's essentially a short form of "value in" --- to compensate for the way the explicit unit lengthens callsites. It's a concession to keep users from rebelling. When contrasted with "as," which has the right kind of "casting" connotations in a programming context, I think it's clear enough which is which. But the "in" and "value in" of mp-units are more fundamentally correct in terms of quantity calculus.

mpusz commented 1 year ago

I do not like the word coerce, I think it is not intentional for many (especially not native speakers).

However, I do see benefits of using member functions over casts here. This is why I would like to entertain checking what we can do (at least theoretically), and then we can decide if we want to progress with this or not.

Returning a quantity:

Forcing?	Rep	Unit	Spec	Au	Current	Proposed
No	`T`	Same	Same	???	???	`q.in<T>()`
No	Same	`u`	Same	`q.as(u)`	`q.in(u)`	`q.in(u)`
No	Same	Same	`qs`	not-supported	`qs(q)`	`qs(q)`
No	`T`	`u`	Same	`q.as<T>(u)` (someday?)	???	`q.in<T>(u)`
No	`T`	Same	`qs`	not-supported	???	`qs(q.in<T>())`
No	Same	`u`	`qs`	not-supported	`qs(q.in(u))`	`qs(q.in(u))`
No	`T`	`u`	`qs`	not-supported	???	`qs(q.in<T>(u))`
Yes	`T`	Same	Same	???	`value_cast<T>(q)`	`q.force_in<T>()`
Yes	Same	`u`	Same	`q.coerce_as(u)`	`value_cast<u>(q)`	`q.force_in(u)`
Yes	Same	Same	`qs`	not-supported	`quantity_cast<qs>(q)`	`quantity_cast<qs>(q)`
Yes	`T`	`u`	Same	`q.coerce_as<T>(u)`	`value_cast<u>(value_cast<T>(q))`	`q.force_in<T>(u)`
Yes	`T`	Same	`qs`	not-supported	`quantity_cast<qs>(value_cast<T>(q))`	`quantity_cast<qs>(q.force_in<T>())`
Yes	Same	`u`	`qs`	not-supported	`quantity_cast<qs>(value_cast<u>(q))`	`quantity_cast<qs>(q.force_in(u))`
Yes	`T`	`u`	`qs`	not-supported	`quantity_cast<qs>(value_cast<u>(value_cast<T>(q)))`	`quantity_cast<qs>(q.force_in<T>(u))`

_Note 1: The above presents pessimistic (and longer) versions of some casts. For example, if we change T to a floating point value and want to change the unit as well, it is enough to type value_cast<T>(q).in(u) today._

Note 2: Continuing "Note 1", the above table does not cover all of the cases as we might have some mixed-forced versions (i.e. forced unit change but an unforced change of the type), but I think that based on the above, it is clear enough to understand how it would look like?

Returning a raw number:

Forcing?	Rep	Au	Current	Proposed
No	Same	`q.in(u)`	`q.numerical_value_in(u)`	`q.numerical_value_in(u)`
No	Explicit `T`	`q.in<T>(u)` (someday?)	`value_cast<T>(q).numerical_value_in(u)`	`q.numerical_value_in<T>(u)`
Yes	Same	`q.coerce_in(u)`	`value_cast<u>(q).numerical_value_ref_in(u)`	`q.force_numerical_value_in(u)`
Yes	Explicit `T`	`q.coerce_in<T>(u)`	`value_cast<u>(value_cast<T>(q)).numerical_value_ref_in(u)`	`q.force_numerical_value_in<T>(u)`

_Note 3: Current assumes numerical_value_ref_in(u) (#484) is merged already._

Returning a raw number reference:

Au	mp-units
`q.coerce_as(u).data_in(u)`	`q.force_in(u).numerical_value_ref_in(u)`

How do we like the "Proposed" interface? Should we also replace quantity_cast() with a member function?

chiphogg commented 1 year ago

I love the proposed APIs! The system is so expressive and so easy to learn. This systematic chart is a great advertisement for going this route.

in means "convert the quantity"
- numerical_value_in means "convert, but give me the raw number, not a quantity"
force means "ignore safety checks"
explicit <T> means "use this rep in the result"
Quantity spec is always applied by passing a quantity to it

Combined with the knowledge that the unit is optional for in, but not numeric_value_in, and that this lets you cast the rep... I think I could fill in the whole chart very easily. Nice!

Well, on looking more closely, I guess there's nowhere to put the "force" word when we want to force a quantity spec cast. So we have to use quantity_cast<qs>(...). This seems OK to me, though.

On force vs. coerce: I like force better in many ways, but in the context of a units library, I think it's too likely to be confusing with... well, forces. Maybe I'm wrong, and the context will make it clear enough that we can get away with it!

Also, in Au, we have rep_cast<T>(q), and it is ~~forcing~~ coercing :sweat_smile:. It had never occurred to me that I could make the unit optional for as, and replace rep_cast with a member function form. Nice! I think I would need to fix aurora-opensource/au#139 before trying this, though, so we couldn't even try this out before the next next release.

JohelEGP commented 1 year ago

FYI, for changing the arguments of quantity<name, unit, number>, what I'm using is reparameter for safe ones, and unsafe_reparameter otherwise. For example:

x.unsafe_rename(width()); // From length to width.
y.rename(length()); // From width to length.

I have them take values, except for (unsafe_)renumber. For origins, I also have reinterpret_origin to change the origin, since I have an use case for that. It's gone beyond unsafe and reached reinterpret.

I share the above interfaces with my geometry types (mostly Cartesian vector/points ATM). At the moment:

I'm still thinking about safe number access, so I still use x.number.
I don't have anything to change various arguments at the same time, like quantity_cast.

mpusz commented 1 year ago

Is q.unsafe_XXX() a better name than q.force_XXX()?

mpusz commented 1 year ago

I am not the biggest fan of the q.in<T>() interfaces, as those in a generic context will require the template disambiguator if q is a dependent name. This is why we had non-member value_cast() before.

JohelEGP commented 1 year ago

Is q.unsafe_XXX() a better name than q.force_XXX()?

In mp_units' case, I think you need to look how the C++ standard library would do it. In my case, I'm writing it in Cpp2, and it currently has cpp2::unsafe_narrow, which I'm basing it on.

I am not the biggest fan of the q.in<T>() interfaces, as those in a generic context will require the template disambiguator if q is a dependent name. This is why we had non-member value_cast() before.

You're right. I think that's why I wrote it as a non-member (as I'm using Cpp2, UFCS means it works anyways). But that's verbose as I have to repeat the parameters in the template parameter list and in the parameter type. But https://github.com/hsutter/cppfront/pull/533 should mean that Cpp2 shouldn't need the disambiguating template. So I can just rewrite it as a member anyways.

I'm going to do a write-up requested at https://github.com/hsutter/cppfront/discussions/620#discussioncomment-6858137 about this quantity<name, unit, number> of mine which I've recently shown, and the other parts of it, so I'll let you know about that (including why I do this even though there's mp_units).

mpusz commented 1 year ago

I just noticed that Au also has the same issue of q.coerce_as(u).data_in(u) that we have in #484 now.

mpusz commented 1 year ago

To realize how bad the template disambiguator is, here is a short example of when we want to convert the representation type to double before the division of arguments.

constexpr QuantityOf<isq::speed> auto avg_speed(QuantityOf<isq::length> auto d, QuantityOf<isq::time> auto t)
{
  return d.template in<double>() / t.template in<double>();
}

while currently, we would need something like this:

constexpr QuantityOf<isq::speed> auto avg_speed(QuantityOf<isq::length> auto d, QuantityOf<isq::time> auto t)
{
  return value_cast<double>(d) / value_cast<double>(t);
}

With this in mind, do we prefer this new interface, or should we stay with the current value_cast approach?

chiphogg commented 1 year ago

I am not the biggest fan of the q.in<T>() interfaces, as those in a generic context will require the template disambiguator if q is a dependent name. This is why we had non-member value_cast() before.

Yep --- this is definitely a downside. It was a huge improvement going from Aurora's old units library, with its q.in<U>() interface (as seen at CppCon 2021! :wink:), to Au and its q.in(u) approach, for exactly this reason: it got rid of most of the .template clutter.

But hsutter/cppfront#533 should mean that Cpp2 shouldn't need the disambiguating template. So I can just rewrite it as a member anyways.

I'd love to see .template disappear! I hope cpp2 succeeds.

I just noticed that Au also has the same issue of q.coerce_as(u).data_in(u) that we have in #484 now.

Why would someone write this, instead of q.coerce_in(u)?

To realize how bad the template disambiguator is, here is a short example of when we want to convert the representation type to double before the division of arguments.
constexpr QuantityOf<isq::speed> auto avg_speed(QuantityOf<isq::length> auto d, QuantityOf<isq::time> auto t)
{
  return d.template in<double>() / t.template in<double>();
}
while currently, we would need something like this:
constexpr QuantityOf<isq::speed> auto avg_speed(QuantityOf<isq::length> auto d, QuantityOf<isq::time> auto t)
{
  return value_cast<double>(d) / value_cast<double>(t);
}
With this in mind, do we prefer this new interface, or should we stay with the current value_cast approach?

I think this is a really tough question. Some worthy design goals:

q.in<T>() is nice because it fits in to the coherent grammar in this table, so it should be easier to learn... but it's vulnerable to becoming q.template in<T>() if q is a dependent name.
value_cast<T>(q) is more consistent with C++ standard idioms, and avoids the .template nuisance... but there's no clear way to fit it into the grammar.
It'd be nice if there were only one way to do it.

One of these three is going to have to go, though.

Maybe one factor could be figuring out how often q turns out to be a dependent name, in practice, in the situations where we're motivated to do a rep cast. If it's pretty common, then that makes value_cast more attractive, and if it's pretty rare then the member function becomes more attractive.

Hmm... how bad would it be to have both?

mpusz commented 1 year ago

Assuming that we will add:

.in<T>()
.in<T>(u)
.force_in(u)
.force_in<T>()
.force_in<T>(u)

and we will leave value_cast<T>(q) for the .template case, do we still need or want to preserve value_cast<Unit>(q)? Let's remember that according to the ISO and BIPM the value of a quantity is defined as:

number and reference together expressing magnitude of a quantity

mpusz commented 1 year ago

Another question is about the force_ prefix. Pick your favorite:

force_in(u)
coerce_in(u)
cast_in(u)
unsafe_in(u)
truncating_in(u)

Any other ideas?

JohelEGP commented 1 year ago

There's little chance of getting it wrong, whatever we choose, if we're not basing it on existing L(E)WG guidelines. WG21 will later do their own bike-shedding, and get it "right".

mpusz commented 1 year ago

Sure, but we still may vote on the thing that we like the most 😉

nebkat commented 1 year ago

A "safe truncating cast" would be very convenient for integer reps where there is a risk of the multiplication overflowing before the division can occur. This could be implemented as a (rep) ((double) v * num / den) or (rep) ((double) num / den * v) while "unsafe" would be without the cast to double first. See also https://stackoverflow.com/a/36892495/336983.

Currently this requires a double value_cast.

JohelEGP commented 1 year ago

You can probably do that with a representation type with that behavior that is convertible to the integer rep.

nebkat commented 1 year ago

Actually this is only a problem for uint64_t https://godbolt.org/z/GqY7cfsGW

mpusz commented 1 year ago

This could be implemented as a (rep) ((double) v num / den) or (rep) ((double) num / den v) while "unsafe" would be without the cast to double first.

I do not think it is a good idea to do this in a generic code for several reasons: 1 . There are users (especially in the embedded domain) for whom double is a no-go.

Some custom representation types are not castable to/from double. For example: linear algebra vectors, uncertainty, etc.

JohelEGP commented 1 year ago

If your use case allows it, you can trap overflows with https://www.boost.org/doc/libs/release/libs/safe_numerics/doc/html/index.html.

chiphogg commented 1 year ago

Thanks @mpusz for pointing out the importance of staying within the integer domain in embedded use cases.

For every unit conversion of this type, there's some smallest value which would overflow in the multiplication step (i.e., an exclusive upper bound on the "friendly range" described here). We can compute this number at compile time. A safe truncating cast could reject values outside of the threshold.

If users wanted a version that would accept anything in the "full range" --- i.e., values whose final result wouldn't overflow --- then we could perform the computations in the widest version of a given type: for signed or unsigned integers, this would be intmax_t or uintmax_t, respectively. We should also be able to compute the threshold for this operation at compile time.

I don't know if we want both versions, but those are some ideas!

nebkat commented 1 year ago

I encountered this while converting a quantity<mag_power<2, -31> * si::milli<si::second> * si::si2019::speed_of_light_in_vacuum, int64_t> (of which only 39 bits are actually used) to si::metre - an effective mag<ratio {139'601'742, 1'000'000'000'000}>.

The overflow makes sense once we know that 39 + log2(139601742) = ~67, but it is still surprising considering the conversion actually decreases the value - and the fact that it only happens for std::(u)intmax_t.

In traditional code this would be more obvious as the numerator would be used directly (or wouldn't be a problem at all since the ratio could be defined as a floating point to begin with). Since the library calculates the ratio on the fly, it is easy to miss that the numerator has gone beyond the acceptable range if you are not paying attention.

Luckily my quantity values were always close to 2^39 so I could see something was clearly wrong, but it could very easily go unnoticed if most values don't overflow. I (wrongly) expected the only risk of value_cast to be a loss in precision, but it's not difficult to fall into this trap when smaller data types don't have the same problem.

Thus at a minimum I think something similar to what @chiphogg suggested should be included, distinguishing between precision loss only (truncate_cast) and faster but potentially unexpected behavior (unsafe_cast).

1 . There are users (especially in the embedded domain) for whom double is a no-go.

I somewhat fall under this category in that my application is embedded - and while double is available to me, I would like to minimize the amount of floating point operations that happen in further calculations after the conversion.

It is hard to say how common this situation is, but I suspect there may be other applications where integers are preferable either for performance or storage reasons (the weird original quantity comes from a binary message format). So far this is the only friction I have encountered using the library, as I must do the intermediate cast and/or pay special attention to integer conversions.

Could this behavior be enabled/disabled with a compiler flag, if an embedded application strictly doesn't want to use it? Perhaps with a choice to use (!in_safe_range(i) ? ((i / B) * A + ((i % B) * A) / B) : (i * A / B))?

You can probably do that with a representation type with that behavior that is convertible to the integer rep.

Some custom representation types are not castable to/from double. For example: linear algebra vectors, uncertainty, etc.

Wouldn't these types also fail the treat_as_floating_point tests anyway (even if they are floating point based)? The concept of truncation is related to integers so it seems reasonable to limit such a safe conversion function to std::integral types. They will also presumably be used much more commonly than custom types.

JohelEGP commented 1 year ago

I'm going to do a write-up requested at https://github.com/hsutter/cppfront/discussions/620#discussioncomment-6858137 about this quantity<name, unit, number> of mine which I've recently shown, and the other parts of it, so I'll let you know about that (including why I do this even though there's mp_units).

I did it, but didn't include the "why", since that'd be off-topic: https://github.com/hsutter/cppfront/discussions/658#discussioncomment-6899974. I use the library to experiment with improvements. Doing the same using mp_units would be too expensive for each experiment.

JohelEGP commented 1 year ago

Perhaps with a choice to use (!in_safe_range(i) ? ((i / B) * A + ((i % B) * A) / B) : (i * A / B))?

That's just exchanging overflow for underflow.

nebkat commented 1 year ago

I should also add the main reason I started using this library is because the protocol in question only uses integer values but with very strange quantities. Most other applications give up immediately and convert everything to double as soon as the values are parsed, but it has been fantastic being able to preserve the original values until the later processing steps.

That's just exchanging overflow for underflow.

Only if the ratio is greater than 1 right?

mpusz commented 1 year ago

I encountered this while converting a quantity<mag_power<2, -31> si::milli si::si2019::speed_of_light_in_vacuum, int64_t> (of which only 39 bits are actually used) to si::metre - an effective mag<ratio {139'601'742, 1'000'000'000'000}>.

Please note that you can convert from a quantity of in64_t to a quantity in double, do your conversion, and then convert back to int64_t if you need better precision.

Wouldn't these types also fail the treat_as_floating_point tests anyway (even if they are floating point based)?

No, we are providing specialization for them:

https://github.com/mpusz/mp-units/blob/507d5bc446dc0921e6b77a96f33c9f9d895c7182/example/measurement.cpp#L120-L121

https://github.com/mpusz/mp-units/blob/507d5bc446dc0921e6b77a96f33c9f9d895c7182/test/unit_test/runtime/linear_algebra_test.cpp#L37-L38

nebkat commented 1 year ago

Please note that you can convert from a quantity of in64_t to a quantity in double, do your conversion, and then convert back to int64_t if you need better precision.

This is how I am working around it currently, but that is a serious hindrance if your application makes use of int64_t quantities. Every single conversion (except with numerator=1 or input range guarantee) must be done with .in<double>(unit).in<int64_t>() to be safe.

chiphogg commented 1 year ago

You could write a non-intrusive free function template. Something like this (untested):

template<class NewUnits, class U, class R>
constexpr auto convert_via_double(const mp_units::quantity<U, R> &q, NewUnits new_units) {
    return q.in<double>(new_units).in<R>();
}

(Please pardon my old school, pre-Concepts coding habits.)

Of course, it's only a stopgap solution, but it shows the intent better at the callsite, and it'll be easily greppable in case we come up with a better mp-units-native idiom for this use case.

JohelEGP commented 1 year ago

If users wanted a version that would accept anything in the "full range" --- i.e., values whose final result wouldn't overflow --- then we could perform the computations in the widest version of a given type: for signed or unsigned integers, this would be intmax_t or uintmax_t, respectively. We should also be able to compute the threshold for this operation at compile time.

At one point mp_units did something like that, just like std::chrono does with https://eel.is/c++draft/time.duration.cast#2. If that's still supported, perhaps we can add it as an extra type argument to value_cast. Then users can choose double instead of std::intmax_t, and perhaps even some bignum type.

mpusz commented 1 year ago

Right now mp-units is using a common type of the:

source Rep
destination Rep
magnitude

For scaling quantity values to a new unit. If source, destination, and ratio are expressed in integral types then indeed a truncation may happen.

https://github.com/mpusz/mp-units/blob/507d5bc446dc0921e6b77a96f33c9f9d895c7182/src/core/include/mp-units/bits/sudo_cast.h#L76-L83

mpusz / mp-units