sharkdp / numbat

A statically typed programming language for scientific computations with first class support for physical dimensions and units
https://numbat.dev
Apache License 2.0
1.21k stars 51 forks source link

Introduce unit systems #116

Open sharkdp opened 1 year ago

sharkdp commented 1 year ago

It would be helpful to include systems of units (SI, Imperial, US Customary, ) as a new concept. Units could then be assigned to those systems, e.g. with a decorator:

@system(SI)
unit meter: Length

@system(Imperial)
unit foot: Length = …

We could think about whether or not we want to allow multiple units of the same dimension per system (e.g. both foot and mile belong to Imperial), or if we want to allow just one "canonical" unit per phys. dimension .

This would help us when performing automatic simplification. For example: we could ask: what is the canonical unit to represent lengths in system XY? (caveat: there is not always a 1:1 relationship. For example: torque has the same dimension as energy, frequency has the same dimension as activity)

If I thought this through correctly, this would also completely solve https://github.com/sharkdp/insect/issues/112. Because we could do sth. like

@system(Hartree)
unit bohr: Length = a0

@system(Hartree)
unit m_e: Mass = electron_mass

@system(Hartree)
@aliases(hartrees)
unit hartree: Energy = ℏ^2 / (electron_mass a0^2)

# …

and then we could — for example — add a new conversion syntax expr -> System, where we would take the type of expr and then look up the canonical type in System. This would allow us to do:

10 miles per gallon -> SI
30 cm -> Imperial
hydrogen_energy(2, 4, 0) -> Hartree

(see also: https://github.com/sharkdp/insect/issues/184)

bsidhom commented 8 months ago

I'm not a fan of the decorator idea just because it implies that a unit belongs to a single system. For example, the mile is used in both the British Imperial and US Customary systems.

I think it makes more sense for a system to be composed of units and to have their definitions be decoupled. Also, in the case of using a "unit system" for simplification, it probably makes sense to have only a single, canonical unit for a given dimension. By the example above, it's unclear whether a distance should simplify to a mile or a foot if they're both part of the system. I'm not sure how Numbat deals with derived unit simplification, but the same argument likely applies in that case.

bsidhom commented 8 months ago

I should also point out that decoupling unit definitions from systems does not address the dimension collision issue described above (e.g., between torque and energy). A system might need to arbitrarily choose exactly one representation for a given dimension.

bsidhom commented 8 months ago

I guess you could allow the decorator to link to multiple systems, but the main thing I'd like to consider is allowing this to be extensible to user-specified unit systems that consist of arbitrary sets of units, potentially mixing and matching from others.

sharkdp commented 8 months ago

I'm not a fan of the decorator idea just because it implies that a unit belongs to a single system. For example, the mile is used in both the British Imperial and US Customary systems.

I mean, we could say

@system(Imperial, USCustomary)
unit mile: Length = …

to associate a unit with multiple systems.

I think it makes more sense for a system to be composed of units and to have their definitions be decoupled.

Not sure I am following. You would rather list all units when defining the system? Instead of listing the system each time a unit is introduced? Or add a separate "associated unit U with system s" statement to decouple it completely?

in the case of using a "unit system" for simplification, it probably makes sense to have only a single, canonical unit for a given dimension.

That is true. We could have an additional @canonical decorator, or similar. But the choice which "length" unit is the canonical one should maybe not be hard coded. In one session, a user might prefer to see all lengths in meters (by default). In another session, they might prefer lengths to be in miles. Or in parsecs.

But I think that is the next-level problem. Right now, the problem we are facing is that we want to simplify something like joule / newton to a length unit. For this particular example, you might say: just look which length unit shows up in the expanded expression (kg m² / s²) / (kg m / s²), and use that. But for other examples, it's not possible. We want to simplify pascal × meter² to newton. But newton does not appear in the base unit expansion of Pa × m². So it would be great to have a way to say: what is the (canonical) unit of force in the system that is currently being used?

the main thing I'd like to consider is allowing this to be extensible to user-specified unit systems that consist of arbitrary sets of units, potentially mixing and matching from others.

That is a interesting point I had not thought about. In this context, it might more sense to introduce systems by listing all units they contain.

Do you have a particular use case in mind?

Do you have an idea for a syntax?

bsidhom commented 8 months ago

I think it makes more sense for a system to be composed of units and to have their definitions be decoupled.

Not sure I am following. You would rather list all units when defining the system? Instead of listing the system each time a unit is introduced? Or add a separate "associated unit U with system s" statement to decouple it completely?

The latter is what I was thinking. Mostly with the idea of letting users define their own systems without causing any conflicts, and being able to tie units together at arbitrary times.

But I think that is the next-level problem. Right now, the problem we are facing is that we want to simplify something like joule / newton to a length unit. For this particular example, you might say: just look which length unit shows up in the expanded expression (kg m² / s²) / (kg m / s²), and use that. But for other examples, it's not possible. We want to simplify pascal × meter² to newton. But newton does not appear in the base unit expansion of Pa × m². So it would be great to have a way to say: what is the (canonical) unit of force in the system that is currently being used?

Agreed.

Do you have a particular use case in mind?

The use case is basically being able to use an arbitrary system as the basis for simplification. Sometimes it's better to use cgs, somteimes mks, sometimes Planck. We can bake in as many of these with the unit definitions, but without users being able to mix and match them post-hoc, we'll always be missing some useful system for somebody. Similarly, you can imagine a situation where you might want areas in acres, volume in gallons, and distance in feet (for example, if you're working on a vineyard/winery). Perhaps this is a bad example (even in the US), but this is the basic idea.

Do you have an idea for a syntax?

It's hard to say without existing list or collection types right now. Especially if we didn't demand that every unit system is exhaustive, then we'll need to fall back to base defaults for unspecified units/dimensions (this also makes it tricky). In my head, I was thinking of something like a heterogenous list (or tuple) of types specifying the preferred unit for a fixed set of dimensions, e.g.,

system cgs = [centimeter, gram, second]

Then the checker ensures that there's at most one unique representation of each dimension in that list. (Again, not sure about fallbacks, in particular with the use of custom dimensions; this can probably just fall back to whatever is happening now.)