sharkdp / numbat

A statically typed programming language for scientific computations with first class support for physical dimensions and units
https://numbat.dev
Apache License 2.0
995 stars 40 forks source link

Scalars on units on right-hand of -> are silently ignored #305

Open Quelklef opened 5 months ago

Quelklef commented 5 months ago

Hi! If I write a scalar multiple on the right-hand side of a unit conversion, such as in 1 MB -> (B * 100), the 100 seems to be silently ignored. See below:

>>> 1 MB -> B

  1 megabyte ➞ byte

      = 1_000_000 B    [DigitalInformation]
>>> 1 MB -> (B * 37)

  1 megabyte ➞ byte × 37

      = 1_000_000 B    [DigitalInformation]
>>> 1 MB -> B * 37

  1 megabyte ➞ byte × 37

      = 1_000_000 B    [DigitalInformation]

I would expect 1 MB -> B * 37 to be evaluated as (1 MB -> B) * 37

Apologies if this has already been reported; I wasn't able to find an existing issue

Quelklef commented 5 months ago

Observed on numbat.dev as of 2024-01-30, FWIW

Quelklef commented 5 months ago

Also, thanks for the lovely tool :-). It eases the pain on many day-to-day computations for me!

eminence commented 5 months ago

Hi, can you expand a bit on what your expectations are?

Is 1 MB -> B * 37 the same thing as 1 MB * 37 -> B ?

Quelklef commented 5 months ago

Hi! I would expect that one of three things would happen:

  1. 1 MB -> B * 37 would be equivalent to (1 MB -> B) * 37, which current seems to evaluate as (1 MB * 37) -> B
  2. 1 MB -> B * 37 would be parsed as 1 MB -> (B * 37), where B * 37 is the unit "37 bytes", so the output would be something like27027.027 (37 B) [DigitalInformation]
  3. 1 MB -> B * 37 would be an error of some kind
eminence commented 5 months ago

For me personally, I would expect 1 MB -> B * 37 to be an error. It's valid syntax, but semantically I don't think it makes much sense.

Option 2 is interesting, as it's a way to define anonymous adhoc units. Today you'd have to do something like:

>>> unit ThirtySevenBytes = 37 B

  unit ThirtySevenBytes: DigitalInformation = 37 byte

>>> 1 MB -> ThirtySevenBytes

  1 megabyte ➞ ThirtySevenBytes

    = 27027.0 ThirtySevenBytes    [DigitalInformation]

Edit: If we wanted to allow option 2, then maybe a more explicit syntax would be better: 1 MB -> unit 37 B

sharkdp commented 5 months ago

I have thought about this multiple times in the past.

In my opinion, there are two approaches. The first approach is how Numbat currently behaves. The second is what you propose (if I understand correctly):

  1. a -> b converts the expression a to the unit of expression b (ignore the value of b)
  2. a -> b converts the expression a to value_of(a/b) × b (take the value of b into account as well)

To illustrate this, imagine you want to know how many 45 min slots (b) you can fit into 6 hours (a). Yes, of course: you can just divide a / b and get your answer (8), but I think it's reasonable to want to phrase this as a conversion problem.

In approach (1), we have something like

  6 hours ➞ 45 min

    = 360 min    # the '45' is ignored completely, we simply convert to minutes

In approach (2), we would probably print something like

  6 hours ➞ 45 min

    = 8 × 45 min

unless b has a value of 1, in which case we would skip the ... × 1 ... part and just print the same thing that we had in approach 1.

I think there is an advantage here compared to just typing 6 hours / 45 min and getting a plain 8 as a result. It's pretty clear in this example, but in the division case, someone might accidentally use the wrong order, type b / a when they actually wanted a / b, and get the inverse result without any error (still results in a scalar). However, if you accidentally type 45 min -> 6 hours, you would get 0.125 × 6 hours and immediately realize the mistake.

The other advantages are:

Just a curiosity: we can implement something similar to approach in Numbat already:

>>> fn convert<T>(a: T, b: T) = if value_of(b) == 1 then "{a -> b}" else "{value_of(a/b->1)} × {b}"
>>> convert(6 hours, min)

    = 360 min    [String]

>>> convert(6 hours, 45 min)

    = 8 × 45 min    [String]
bsidhom commented 4 months ago

I think the use case for actually using scalars on the right is more obvious if you consider rate units. For example, 1/(30 mpg) -> L/(100 km). It would be really nice to have the correct units pop out (in L/100 km). I assume the reason for this behavior is so that you can put a variable on the right hand side and have units of the left hand side match that (I recall reading about this somewhere in the docs). Would it be possible and reasonable to support both use cases? IMO, it would make more sense to respect scales in the units (those do semantically change the units scale just line SI prefixes) and have some auxiliary mechanism to extract just the base units from a variable if that’s what’s desired. Then you can use this base unit on the right hand side.

On the other hand, rendering of the scalar might be confusing without a proper rational type in the units. For example, it wouldn’t be ideal to display the above example in 0.01 L/km. Is there any support for rationals at the moment?

bsidhom commented 4 months ago

(Note that the above is what GNU Units does when producing output units.)

sharkdp commented 4 months ago

I think the use case for actually using scalars on the right is more obvious if you consider rate units. For example, 1/(30 mpg) -> L/(100 km).

That is a good example, yes!

I assume the reason for this behavior is so that you can put a variable on the right hand side and have units of the left hand side match that (I recall reading about this somewhere in the docs).

Kind of the other way around. The current implementation treats lhs -> rhs as a binary operator between two expressions. And it simply takes the unit of the rhs expression and converts to that. Ignoring whatever value rhs has.

It was easy to implement it like this and I kind of liked that both sides are simply expressions. No special parsing rules, no additional AST nodes, etc.

And that implementation happened to fulfill this … -> variable use case.. which I found by accident and then put into the documentation. But it's not like this was intentionally designed this way.

It has been increasingly clear to me that we want -> to be something more powerful. And in fact, in the latest release, we already have a special feature where x -> f is a synonym for f(x), if f is of function type Fn((type x) -> _). We do this to support things like 2^31 -> hex or 120 hours -> human.

Would it be possible and reasonable to support both use cases? IMO, it would make more sense to respect scales in the units (those do semantically change the units scale just line SI prefixes) and have some auxiliary mechanism to extract just the base units from a variable if that’s what’s desired.

Yes! I think with the feature proposed here, we could still fulfill the "convert to unit of variable" use case, if someone really needs that. We already have that mechanism to extract the unit of a quantity. And it would even look much clearer, because we would call expr -> unit_of(variable).

And the new expr -> variable use case is even more important, I think. Imagine someone wants to see an electric charge in units of the electron_charge. They would simply do my_charge -> electron_charge. Or convert a speed to fractions of the speed of light with my_velocity -> speed_of_light.

So let's implement this!

On the other hand, rendering of the scalar might be confusing without a proper rational type in the units. For example, it wouldn’t be ideal to display the above example in 0.01 L/km. Is there any support for rationals at the moment?

There is no support for rationals, but I'm thinking about it (https://github.com/sharkdp/numbat/issues/4#issuecomment-1242516737)