mpusz / mp-units

The quantities and units library for C++
https://mpusz.github.io/mp-units/
MIT License
1.07k stars 85 forks source link

The behavior of raw data accessors #412

Closed mpusz closed 1 year ago

mpusz commented 1 year ago

As @JohelEGP noticed in https://github.com/mpusz/units/issues/411#issuecomment-1362809717, current accessors remove all encapsulation. Should we remove lvalue overload so the only remaining would be:

  [[nodiscard]] constexpr const rep& number() const& noexcept { return number_; }
  [[nodiscard]] constexpr rep&& number() && noexcept { return std::move(number_); }
  [[nodiscard]] constexpr const rep&& number() const&& noexcept { return std::move(number_); }
burnpanck commented 1 year ago

As mentioned elsewhere, I'm actually not a big fan for .number() anyway. Can we have a number_in<units>() accessor instead/additionally? That said, the lvalue accessor is definitely the worst of the pack, so you have my blessing to make it go (first) :-).

mpusz commented 1 year ago

First, please let me thank you for a lot of great feedback you provided in the last few days! That helps a lot and shines the light on interesting problems. Thanks!!!


As I just stated in #239, representation-type conversion may introduce data truncation. If number_in<units> for that specific unit was truncating the data, it probably should not be allowed without an explicit cast. As the quantity_cast is still needed here, I am not sure if we are winning much:

quantity_of<isq::distance[m]> auto q = get();
auto num1 = q.number_in<q.unit>();   // give me what you have
auto num2 = q.number_in<mm>();       // convert in a safe direction
// auto num3 = q.number_in<km>();    // compile-time error if not a floating-point representation
auto num3 = quantity_cast<km>(q).number_in<km>();  // now OK

Now the question is if we want such a feature and if it should replace the current number() or if it should be an additional overload set.

burnpanck commented 1 year ago

I like that overload set you provided, with exactly the semantics you describe. To me, q.number_in<mm>() is a gain over quantity_cast<mm>(q).number(), because nothing prevents you from moving the .number() call away from the cast.

The fact that the third example requires a "double cast" is really a plus to me: I can get assurance that q.number_in<km>() without a cast does not truncate, even if I do not know the exact unit that q was specified in. In generic code, a compile-time error may be what I want. I can then deal with the situation with a double cast, which in many cases may well be rather quantity_cast<float>(q).number_in<km>() than quantity_cast<km>(q).number_in<km>(). After all, if my generic code is supposed to simultaneously deal with quantities spanning a wide dynamic range, I should probably really use floating point arithmetic.

Given that q.number_in<q.unit>() is guaranteed to be equivalent to .number() I believe this could indeed replace the latter completely, but I don't care too much. (with #411, we currently anyway have public access to number_).

mpusz commented 1 year ago

(with https://github.com/mpusz/units/issues/411, we currently anyway have public access to number_

That is right, but it should be considered an implementation detail rather than an official interface. This is the same as for std::array. You have a public member, but you probably never used it, right? If the definition of a structural type will change to be less restrictive we will use a private member again.

I even consider renaming the member object to something like __private__implementation__number__ so no one dares to use it directly 😉 The only problem is that it would be undefined behavior, but who cares... 😜

burnpanck commented 1 year ago

I agree. Because of q.number_in<q.unit>(), I still tend to vote for the removal of .number(), even if number_ is inaccessible or an implementation detail. Though as I said, no strong feelings here.

mpusz commented 1 year ago

OK, so the last thing to note here. number() returns a (const) reference, whereas number_in<>() will always require returning by value which may involve a copy/move operation. Some people may be unhappy because of that. Should we care about that?

burnpanck commented 1 year ago

If the number type is heavy, then we may care. Numbers usually shouldn't be, but who knows, there could be a use-case. So you convinced me. Let's keep it.

mpusz commented 1 year ago

I just realized one more thing. number_in<>() is a member function template whose template parameter is not being deduced. This means that in generic code, for dependent names, we will have to always provide a template disambiguator(imaginestd::get<>()being a member function ofstd::tuple). Do we want such an interface? Or maybe we should provide a non-member function (likestd::get<>()`)?

burnpanck commented 1 year ago

Ah, good point. As we seem to move towards preferring values over types (see boost::hana), the "unit specification" (cast specification restricted to units) that we supply could actually be a value rather than a type (though I don't remember off the top of my head if that is the case in current V2 for units). Then, q.number_in(unit_spec) could be an alternative? That would be my preferred form, but I would be willing to accept a freestanding function or even typename everywhere too.

mpusz commented 1 year ago

the "unit specification" (cast specification restricted to units) that we supply could actually be a value rather than a type (though I don't remember off the top of my head if that is the case in current V2 for units).

Yes, it is. Users typically do not notice because of the fact that we name a type and an instance with the same identifier.

To make it explicit:

quantity_cast<quantity<isq::speed[km/h]>>(q);  // takes type
quantity_cast<km/h>>(q);  // takes value
quantity_cast<isq::speed>>(q);  // takes value
quantity_cast<isq::speed[km/h]>>(q);  // takes value
quantity_cast<int>(q);  // takes type

In general, units, quantity_spec, and references are always passed as NTTP objects.

mpusz commented 1 year ago

Then, q.number_in(unit_spec) could be an alternative?

Yes, I like this idea!

mpusz commented 1 year ago

To summarize:

  1. I remove non-const lvalue overload of number()
  2. I add number_in(Unit auto) member funciton
mpusz commented 1 year ago

Done! :-)

However, this might be a good place to discuss one more issue. Currently, quantity and quantity_point behaves like std::optional and std::variant. Namely, they return an rvalue reference for rvalue reference specified member accessor functions. Recently many people say that this is a bad design and may lead to dangling references (i.e. see the range-for safety discussion https://wg21.link/p2644).

Alternatively, we could return by values/copies for those, which, on the other hand, could be perceived as a broken overload set as sometimes you get a reference (for const lvalues) or an rvalue (for rvalues).

To make it clear. Here is what we have now:

https://github.com/mpusz/units/blob/da64f1789583f7b2a32ce04e12731f6c1e123758/src/core/include/mp_units/quantity.h#L142-L144

And here is a possible alternative:

  [[nodiscard]] constexpr const rep& number() const& noexcept { return number_; }
  [[nodiscard]] constexpr rep number() && noexcept { return std::move(number_); }
  [[nodiscard]] constexpr rep number() const&& noexcept { return std::move(number_); }

What is the right way to do here?

BenFrantzDale commented 1 year ago

I'm using this library in production, but I've avoided constructors and accessors entirely, wrapping everything in a pair of templated functions. Their impelmentations are icky, but the user-facing part is

millimeter_t foo = fromCount<millimeter_t>(42.0); // 42.0 mm
fTakingMillimeters(getCount<millimeter_t>(foo)); // Calls fTakingMillimeters(42.0).

Being strict about only using fromCount<Unit> and getCount<Unit> protects me against things like this: If we have

 using Dist_mm = millimeter_t;
 using Vol_mm3 = decltype(pow<3>(std::declval<Dist_mm>()));
 using Vol_mL = millileter_t;

then

-Vol_mm3 vol;
+Vol_mL vol;
 ...
 fTakingMM3(static_cast<double>(vol)); // The change to Vol_mL silently breaks this line.

So instead I have

-Vol_mm3 vol;
+Vol_mL vol;
 ...
 fTakingMM3(getCount<Vol_mm3>(vol)); //< Compile error: vol is not a Vol_mm3.
mpusz commented 1 year ago

@BenFrantzDale, thanks for sharing!

I do not believe that fTakingMM3(static_cast<double>(vol)); actually compiles as we do not provide a conversion operator to the representation type, but I see your point. I think that q.number_in(Unit) should be a good solution for you in this case?

BenFrantzDale commented 1 year ago

Yes. number_in<units>() sounds great.

I guess it's the other way around. I'd encourage you to consider dropping the explicit c'tor and replace it with a factory function. It's so easy to explicitly construct things (std::vector<millimeters_t> sizes; sizes.emplace_back(42.0);).

BenFrantzDale commented 1 year ago

Related: right now getting the count of a percentage gives the scalar value. So if you add that, be sure that (25_pct).number_in<pct_t>() == 25.0 not == 0.25.

Closely related (maybe I should make a separate ticket): I think ceil is a hole in the unit safety and should be removed for dimensioned types. After all, what's the ceiling of my height?

I could see ceil_in<units>(x) as shorthand for fromCount<units>(ceil(x.number_in<units>())).

mpusz commented 1 year ago

Yeah, I also heard this in the ISO C++ Committee from one person. On the other hand, the rest of the room would like to have a traditional way of doing things (via constructors).

BenFrantzDale commented 1 year ago

Yeah, I also heard this in the ISO C++ Committee from one person. On the other hand, the rest of the room would like to have a traditional way of doing things (via constructors).

I believe it, but so far, I've found that any holes in unit safety let bugs through. With a safe getter and factory function and literals, it's pretty darn solid.

mpusz commented 1 year ago

maybe I should make a separate ticket

Please do

mpusz commented 1 year ago

In the V2 framework you will be able to do something like that std::vector<quantity<si::milli<si::metre>>> sizes; sizes.emplace_back(42.0 * mm); which is terse and safe. If that will work, I think we can consider removing the constructor taking a raw number. But this should be a separate issue as well 😉

BenFrantzDale commented 1 year ago

https://github.com/mpusz/units/issues/432 https://github.com/mpusz/units/issues/433 https://github.com/mpusz/units/issues/434

mpusz commented 1 year ago

Related: right now getting the count of a percentage gives the scalar value. So if you add that, be sure that (25_pct).number_in() == 25.0 not == 0.25.

I am not sure what you mean by that. Could you please provide more info? The following works:

constexpr dimensionless<one> q{0.25};
static_assert(quantity_cast<percent>(q).number() == 25);

see https://godbolt.org/z/vqs4d6GxW.

chiphogg commented 1 year ago

@BenFrantzDale --- I think your getters/setters use the correct unit-safe pattern, but how do you prevent people from calling the constructors?

chiphogg commented 1 year ago

OK, so the last thing to note here. number() returns a (const) reference, whereas number_in<>() will always require returning by value which may involve a copy/move operation. Some people may be unhappy because of that. Should we care about that?

An alternative would be to give the "direct access" version of the function a different name. In Au, our value-semantic copying accessor (to get the value of a quantity q in units units) is spelled q.in(units). This will perform a conversion if units is not equivalent to the units of q, and if it's safe to do so. And the underlying-number accessor is spelled q.data_in(units). This will not compile if the units are inequivalent, but it gives a (const or mutable, as appropriate) handle to the underlying stored variable if they are equivalent. See aurora-opensource/au#28 for more details.

This means it's OK to do the following, for example:

auto x = inches(3);
++(x.data_in(inches));
EXPECT_EQ(x, inches(4));
chiphogg commented 1 year ago

Stepping back, and giving the bigger picture on this issue: I can share some real-world implementation experience.

When we were first writing Aurora's units library, folks were worried that it would be too burdensome to have to name the unit at the callsite. I said I'd consider adding a "just give me the underlying value!" function if I got a compelling use case. Since then, we have had years of experience with widespread use in production, and I have yet to hear a single complaint. The pattern works, and it makes code both safer and more readable.

So I think it'll be great to remove all instances of .number(), and replace them with .number_in(units). (You can add something like .stored_number_in(units) if you want a safe handle, similar to Au's .data_in(units).)


By the way: mp-units is the very first units library I have seen (other than Au) which takes these unit-safe interfaces seriously. This is thrilling! I am heartily encouraged by this change.

mpusz commented 1 year ago

I said I'd consider adding a "just give me the underlying value!" function if I got a compelling use case. Since then, we have had years of experience with widespread use in production, and I have yet to hear a single complaint.

Am I correct that Aurora works only with C++ arithmetic (fundamental) types? Those are cheap to copy. For me, the most important reason to leave a raw accessor is to be able to access the underlying storage by const reference. It might be important if someone uses some untrivial representation type.

chiphogg commented 1 year ago

Yep --- only arithmetic (fundamental) types for now. Expanding this is in-scope for us, however; we're tracking it in aurora-opensource/au#52.

(BTW, nice work in mp-units on explicitly specifying a concept for the representation type!)

mpusz commented 1 year ago

nice work in mp-units on explicitly specifying a concept for the representation type

Thanks, but still, we need to do a better job here. See: https://github.com/mpusz/units/blob/622b3e3cbd21506d53fba4d97204fac6ab108aef/src/core/include/mp_units/bits/quantity_concepts.h#L105-L108.

ka-ba commented 1 year ago

Coming back to accessing raw data (I just asked myself how to do it in code using the library). I'm not able to oversee the consequences, but the most natural thing from the users perspective would seem (for me), using operator/ for this: I use operator* to form the bond between scalar and unit, e.g:

constexpr auto tencm = 10 * cm;

Consequently, removing the unit would be the reverse operation:

static_assert( 10 == tencm / cm);

... which, when written on paper this way, would read aloud as "value of variable tencm in cm". Seeing it this way, I'd also intuitively expect:

static_assert( 100 == tencm / mm);

Interestingly enough, I can already do something quite similar:

static_assert( 10 == tencm / (1*cm));

, then again

static_assert( 100 == tencm / (1*mm));

will not work.

As already stated, the consequences this would have are completely in the mist for me, but I leave you with this food for thought: IMO code would be easily readable when it looks like this:

// altitude being a mp-units type using e.g. units::isq::si::fps::foot
double alt_in_feet = altitude / ft;
BenFrantzDale commented 1 year ago

Coming back to accessing raw data (I just asked myself how to do it in code using the library). I'm not able to oversee the consequences, but the most natural thing from the users perspective would seem (for me), using operator/ for this: I use operator* to form the bond between scalar and unit, e.g:

constexpr auto tencm = 10 * cm;

Consequently, removing the unit would be the reverse operation:

static_assert( 10 == tencm / cm);

... which, when written on paper this way, would read aloud as "value of variable tencm in cm". Seeing it this way, I'd also intuitively expect:

static_assert( 100 == tencm / mm);

Interestingly enough, I can already do something quite similar:

static_assert( 10 == tencm / (1*cm));

, then again

static_assert( 100 == tencm / (1*mm));

will not work.

As already stated, the consequences this would have are completely in the mist for me, but I leave you with this food for thought: IMO code would be easily readable when it looks like this:

// altitude being a mp-units type using e.g. units::isq::si::fps::foot
double alt_in_feet = altitude / ft;

On the one hand I agree, and I do see the elegance of 10.0_mm/s. On the other hand, when it’s not literals, I really like having “I have a double that is in mm” and “I have strong mm, please give me the double” be distinct from “Take x and multiply it by mm” and “Take y and divide it by mm.” I really like the error messages that distinction produce. I started using / 1.0s to unpack chrono durations and found it to be unintuitive. Especially if the thing I’m trying to unpack isn’t what I think it is, then I could get foo/s as the result type not an immediate error.

Food for thought.

chiphogg commented 1 year ago

Expanding on @BenFrantzDale's comment: what is (72 * cm) / m? Many people's intuition is that this should be 0.72, but I think that's wrong. It should be 72 * (cm / m), which is equivalent to percent(72). 0.72 has two contributions: the original value of 72, and a division by 100. The "divide by 100" information is carried in the type, and we should leave it there until somebody makes an explicit request to perform a conversion. (For one thing, we might perform other operations later, and keeping it in the type prevents us from losing information.)

What's really interesting is that the notion of "value" becomes ambiguous for every dimensionless unit, except the "unit one". There are two perfectly reasonable candidates for "the" value of such a quantity. A consequence which follows is that rather than try to guess which one the user meant, this kind of implicit conversion should be forbidden, as I explained here: https://github.com/nholthaus/units/issues/301#issuecomment-1381297235

The upshot is that while the "dividing method" really is creative, elegant, and intuitively appealing, I don't think it works out in practice as an API... whereas .number_in(units) doesn't suffer from this problem.

mpusz commented 1 year ago

Yes, this is exactly what I wanted to write here. The fact that the std::chrono::duration / std::chrono::duration returns std::common_type_t<Rep1,Rep2> is a huge problem with value precision. Consider dividing a floating-point duration in picoseconds by one in seconds. There will be a huge data truncation there. mp-units behaves differently. In such a case, it does not return a raw representation type but a dimensionless quantity with a proper unit assigned.

Also, to extend on the @chiphogg comment. mp-units provides implicit conversions for dimensionless quantities only if they are expressed in a unit one:

https://github.com/mpusz/units/blob/8e612865480394277cac56ef2ad9836e0c4d87bd/src/core/include/units/quantity.h#L143-L147