User Defined Literal Suffixes

ozra commented 8 years ago

While coding on a one-off physics simulation, in Onyx, for a device I'm constructing, I realized something that would be helpful for such advanced-excel-work'ish scripts / one-off simulations / scientific analysis / engineering / what-the-fuck-ever.

There's already an experimental functionality to redefine the type a certain literal (int or real) generates, scope by scope, I don't think I've publicly revealed it though, it was lacking somehow. It would be replaced and consumed by this suggestion. With catch-all suffixes, the standard type generated by a int, real or numeric generically could be defined via the same methodology.

Practically it uses templates, only difference being that it's "triggered" via a suffix-syntax rather than call syntax. It's way cleaner than C++14 user suffixes that way imo.

The pro is that very liberal and terse naming can be used, which wouldn't be appropriate for methods, for ex. a = 5.m + 3.cm, but instead a = 5m + 3cm. Literal-suffixes naturally only work on literals...

It would be best accompanied with a using clause, so appropriate Suffix-defs can be used for scopes of definitions / uses, instead of permanently included (most of the time). This way no restriction needs to be imposed on users naming choices (as opposed to C++14).

Some ideas for definition syntax and use-cases:

suffix (string)slug = slugify-string(_)
suffix (int) = Int _
suffix (real) = Real _
suffix (number)mm = Real _
suffix (number)cm = Real _
do-stuff-with 2cm - (3mm + 2mm)

when looking for matching suffix-"overload":
- first look for match on closest literal base-type, for instance "int"
- if none, look at suffix "number"
- else suffix "int *" (suffix catch all)
- else suffix "number *" (suffix catch all)
- if not found in module, continue up the chain to global level (where sane defaults are defined, for more general suffixes)

Here's a full pseudo-code example (not implemented obviously, thereby not tested, just a throw up and some cut and pasted code)

module Metrics.Suffixes[U = 1.0]
   BaseUnitRelToMeter = U

   suffix (number)km    =>  Real _ * 1000 * BaseUnitRelToMeter
   suffix (number)m     =>  Real _ * 1 * BaseUnitRelToMeter
   suffix (number)dm    =>  Real _ * 0.1 * BaseUnitRelToMeter
   suffix (number)cm    =>  Real _ * 0.01 * BaseUnitRelToMeter
   suffix (number)mm    =>  Real _ * 0.001 * BaseUnitRelToMeter
   suffix (number)in    =>  Real _ * 0.0254 * BaseUnitRelToMeter
   suffix (number)ft    =>  Real _ * 0.3048 * BaseUnitRelToMeter
   suffix (number)cubit =>  raise "Really? Sumerian, Hebrew, Egyptian, or..."

module Time.Suffixes[U = 1000]
   ResolutionOfSecond = U

   suffix (number)s    =  Real _ * ResolutionOfSecond
   -- notice the clash with "m" for "meter" above, should both be used
   suffix (number)m    =  Real _ * 60 * ResolutionOfSecond
   suffix (number)min  =  Real _ * 60 * ResolutionOfSecond
   suffix (number)h    =  Real _ * 60 * 60 * ResolutionOfSecond
   suffix (number)day  =  Real _ * 24 * 60 * 60 * ResolutionOfSecond
   suffix (number)days =  Real _ * 24 * 60 * 60 * ResolutionOfSecond

module Engineering
   module Energy
      module Suffixes
         suffix (number)W = Real _
         suffix (number)J = Real _

   module Electrics
      module Suffixes
         include Energy.Suffixes
         suffix (number)A = Real _
         suffix (number)V = Real _
         suffix (number)Ω = Real _
         suffix (number)VA = Real _

   module Materials
      module Suffixes
         include Energy.Suffixes
         suffix (number)C = Real _
         suffix (number)K = Real _

      module All
         include Materials, Suffixes

      using Suffixes
         type Material < Value
            @name Tag         'get
            @k         = 1K   'get
            @density   = 0Kg  'get

            init(@name, ...) ->
         end Material

      template material(name, k) =
         {=name.id=} = Material #{=name.id=}, k: {=k=}
         MaterialsList << {=name.id=}
      end

      MaterialsList = [0 x Material]

      using Suffixes begins

      material Copper,        400K
      material Aluminium,     210K
      material Silver,        429K
      material Lead,          34K
      material Water,         0.58K
      material Glass,         1.01K
      material GlassWool,     0.04K
      material FiberGlass,    0.04K
      material UrethaneFoam,  0.04K
      material EPS,           0.03K
      material Air,           0.024K

   using Metrics.Suffixes<1.0>, Materials.All begins

   type Barrier
      @w        = 10mm  'get
      @h        = 10mm  'get
      @d        = 10mm  'get
      @material = Water 'get
      @kmm      = 0K    'get
      @foo      = "foo"

      init(@w, @h, @d, @material) ->
         @kmm = @material.k * (1mm * 1mm)  -- stored in k/mm2 instead of k/m2

   end Barrier

   -- A function not really doing anything useful (calculating u would be)
   bridge-k-values(barriers List<Barrier>, t-in, t-out) ->
      for b in barriers
         say "{b.class}, {b.material.name}, {b.material.k}, "
             "{b.kmm}, TΔ = {t-out - t-in}"

      barriers.map ~.kmm

end Engineering

Usage pseudo code example:

include Engineering

-- `Metrics` generic param is kept as default
using Materials.All, Metric.Suffixes

   -- below is where the "commas omitable with literals" could be nice:
   my-barriers = [
      Barrier 10mm, 13mm, 0.1mm, Aluminium
      Barrier 10mm, 13mm, 0.1mm, Zink
      Barrier 10mm, 13mm, 1mm,   Aluminium
      Barrier 10mm, 13mm, 13mm,  Water
   ]

   say bridge-k-values
      my-barriers
      30C
      25C

   good-insulators = MaterialsList.filter ~.k < 0.05K
   good-thermal-conductors = MaterialsList.filter ~.k > 200K

An obvious use case is to ensure literals get an appropriate type, no matter if decimals are typed or not (and as seen, most oftenly Real)
It makes code clearer by showing the values unit
Since the right side of the definition is just a template-body, any code wanted can be generated
Obviously then, also using other suffixes, which will be expanded in turn
f32, f64, i32, etc. etc. are rendered to direct asm-store-ops at codegen stage - they have "lowlevel" mappings. Everything that can be reduced to these will be of course.
1f will map to 1f32, 1d to 1f64, by defaults.
Since underscore can be used anywhere in numbers, they can of course be used before the suffix as usual: 47_f
To be really useful, the using clause would be needed too

Woo, now I'm so tired my brain will soon syntax error deluxed. Better drop this now.

stugol commented 8 years ago

I don't pretend to fully understand that code - I am not a chemist, neither have I heard of "Zink" ;)

However, my experience of unit suffixes has been in the following contexts:

say 10mm + 1cm        -- prints "11mm"?

say 10mm + 1ml        -- surely this shouldn't compile?

ozra commented 8 years ago

Ah, a bit of swenglish, haha. Zinc.

say 10mm + 1cm -- prints "11mm"?

Depends on how ()mm and ()cm are defined. If their defined like in the example, it simply puts out "11.0". If instead a Distance-type or something like that had been used as result of the suffixes, then it could implement to-s anyway it wants of course.

say 10mm + 1ml -- surely this shouldn't compile?

Same here. If ()mm and ()ml simply are used for typing, or typing and recalculating to a unity unit-value (say liters) or also typing it to say Distance and Volume respectively - then it would not compile - and that's when you start seeing the real benefits of it. So you found the real use-case immediately even though it wasn't even show cased, says good things about the concept :-)

(number)ml = Volume _ * 0.001 * LiterUnit
(number)mm = Distance _ * 0.001 * MeterUnit

Done. (except the implementation of those types of course)

v = 5ml expands to v = Volume.new(5 * 0.001 * LiterUnit)

LLVM of course optimizes away the contstant expression, so in final machine code it's just one assembly op! (pseudo asm) store-f64-val var-addr, 0.005. Or even more likely, if used in one place, it's just 0.005 as argument to the op-code for what is done. Faaaaaast! Cleaaaan! :-)

You would then, if you wanted to compile such crazyness of course have to do something like: say "Crazy calc: {Real(10mm) + Real(1ml)}" or say "Crazy calc: {10mm.to-r + 1ml.to-r}". Provided to-r is implemented or Real.new(v Distance) ->, etc. Or if Int's where wished to be used, whatever, you catch the drift.

stugol commented 8 years ago

There ought to be a clean, simple syntax for declaring a unit. For example, mm and cm are distance units, while ml and floz are volume:

units Distance
  m
  cm = m * 100
  mm = cm * 10

units Volume
   ml = floz * 29.574
   floz

say 10ml + 5floz       -- outputs whatever the fuck that equals ;)

say 1m + 10cm + 10mm    -- outputs 1110mm (or possibly 111cm?)

say 1mm + 1ml       -- type conversion error

Mathematical operations between different units of the same type (e.g. Distance) should automatically yield an answer; in either the smallest unit that took part in the calculation, or - better! - the largest unit that can represent the value as an integer. Assuming the value can be represented as an integer at all, of course.

Perhaps it would make more sense to write them the other way around:

units Distance
  m
  cm = m / 100     -- note the division - we're defining the size of a cm in relation to a m now
  mm = cm / 10

Yes, this way makes far more sense.

ozra commented 8 years ago

I prefer the more generic approach of just calling it literal suffix, and declaring it like "a suffix-template" though. Maximum usability, only a variation of already common notation. All (other) calculative aspects are simply implemented via the used types.

Suffixes are things you define in a lib and think a bit about in order to decrease clashes, and then just re-use and re-use. Won't happen to often "ad-hoc", so I see no reason for a dedicated syntax structure for these, it only renders them less generic imo.

(type-of-literal)suffix = template-body

There's slight sugar to the template: _ implicitly represents the literal-node being expanded, since it's always one.

I've omitted the implementation of arithmetics for the types below, that is also something that can be simplified or common cases with macros or a generic def on Any using 'Self' as other counterparts type and de-facto using @value (for more specific cases than that one has to implement arithmetic methods of course).

Geometry:
   -- Base unit of choice should of course be a generic param too (U)

   Distance[T = Real] ::
      @value = T 0
      to-s() ->
         if @value < 1
            "{@value * 100} centimeters"
         else
            "{@value} meters"

   end Distance

   Volume[T = Real] ::
      @value = T 0
      to-s() ->
         if @value < 0.1
            "{(@value * 100).round 2} centiliters"
         else
            "{@value} liters"

   end Volume

   (number)m  = Distance _
   (number)cm = Distance _ * 100
   (number)mm = _cm * 10
   -- `47mm` expands to `47cm * 10`, which expands to `Distance 47 * 100 * 10`

   -- can of course specialize (depending on if `47` or `47.0` is given:)
   (integer)cm = Distance<Int> _ * 100
   (real)cm    = Distance<Real> _ * 100

   (integer)ml    = Volume<Int> _ * 1000
   (real)ml       = Volume<Real> _ * 1000
   (number)ukfloz = Volume _ * 0.028413
   (number)usfloz = Volume _ * 0.029574

include Geometry

say 10ml + 5ukfloz      -- as per above defs: "15.21 centiliters"

say 1m + 10cm + 10mm    -- as per above defs: "1.11 meters"

say 1mm + 1ml           -- type conversion error

stugol commented 8 years ago

(Actually, floz is an American measurement as well, not just a UK one.)

I disagree. 1.5ml and 1ml shouldn't be disjoint concepts.

say 1.5ml.class      -- Capacity.ML[Float]
say 1ml.class        -- Capacity.ML[Int]
say 1ml + 1.5ml       -- 2.5ml

In fact, it probably makes sense to use floats always.

I also disagree that it won't happen ad-hoc. I think programs (and the stdlib) should apply this feature liberally:

info(v) -> say "%s : %s" % [v, v.class]

info File(...).size    -- "27KB : ByteSize.KB"
qty = 270mol           -- moles, a unit of chemistry
info 16m / 1s          -- "16m/s : Speed.MPS"
fuel-efficiency = 86mpg

ozra commented 8 years ago

(Actually, floz is an American measurement as well, not just a UK one.)

That's why I added both uk and us floz ;-) Those damn empirical units - never compatible! Well they agreed on the foot since some time back at least B-)

In fact, it probably makes sense to use floats always.

Yes, I think for the majority of cases where suffixes are used, it's generally where a real'ish type is used underlying. However the possibility of making the suffix more specific according to literal-form should be available - it can be of use in certain cases, where optimizations can be made and it still acts transparent to the user. Currencies are always stored fixed-point to maintain integrity of the values for instance, floats are simply to in-exact to be allowed in to financial calcs. And for defining the standard resultant type of literals without suffix. For a scientific app, it may make sense to define that all (integer)* = BigInt("_"), so: a = 32 + 9809434878749384793433434344 would here be expanded to a = BigInt("32") + BigInt("9809434878749384793433434344") a calculation that wouldn't have been possible in any other statically compiled language. This is a great use case.

I also disagree that it won't happen ad-hoc. I think programs (and the stdlib) should apply this feature liberally:

Yeah, but that's not ad hoc, it's very reasonable standard suffixes - so all good :-)

stugol commented 8 years ago

US floz is different to UK floz? Well damn. Those yanks can't get anything right, can they?

Still, I reckon a shorthand syntax for units would be nice. We can have the longhand syntax as well, but do you really want to write out arithmetic methods for each and every damn measurement? Just make it infer from a simple notation, and get the compiler to do the work.

ozra commented 8 years ago

:-D

Yes, I've got a one-liner user solution in the back of my head for arithmetics - it won't become a problem. I'll expand on it later. Haven't slept for 51 hours, so I think I might need a break from the 'puter now ;-)

stugol commented 8 years ago

You haven't slept for 51 hours!? What is wrong with you!? :O

ozra commented 8 years ago

Of that, the doctors aren't really sure, haha. It's been like that since I was a kid, every week or second week, I get this manic burst which completely denies me any sleep. But that's when I have the most ideas so... shrugs.

ozra commented 8 years ago

So... the details.

Since 0x is base-prefix for hexadecimal, the amount of suffixes beginning with x would be limited. I think the best option might be to simply forbid suffixes beginning with x. Consider: 0xe: hexadecimal number E. '0o' and '0b' base-prefixes are only followed by arabic number subsets, so no problem.

Motivation:

There are no SI units or SI prefixes beginning with x (that I know of)
I don't think there are any empirical units beginning with x
I can't really think of an ad hoc unit beginning with x
x alone could be allowed (but what does that mean!!?)

stugol commented 8 years ago

Agreed. Forbid x entirely. But have a compiler error message that points this out.

ozra commented 8 years ago

Yes. More details I forgot:

For base-16 literal, I was thinking suffixes can only be used prefixed with double underscores. Since otherwise problems will arise for all suffixes beginning with [a-f] on hex-literals specifically (the "x"-case is for all number literals):

The alternative are:

Allow any suffix, require double underscores
- Deliberately ugly formatting: a = 0xab_cd_e__A - but then, who would define Ampere hexadecimally?
Forbid suffixes on hexadecimals entirely (not good)
Allow only "initrinsic suffixes": 'i32', 'u8', etc. - double underscores not required.
- a = 0xab_cd_e_u64

Intrinsic suffixes are predefined and steer machine code generation, those are the "terminal suffixes". i32 suffix for instance will not be possible to re-define. (Also an error with informative message).

stugol commented 8 years ago

Why does the hex-literal contain underscores in the first place? I say forbid underscores in hex-literals, and require an underscore for a suffix on a hex-literal.

ozra commented 8 years ago

Underscores are great for grouping digits: 0x_FFFF_FFF0_0000_0000 is easier to grasp than 0xFFFFFFF000000000.

stugol commented 8 years ago

Ah. Hm. I guess you could prefix the suffix with '. Or just use the massively ugly __.

(Btw, _ isn't a hyphen.)

ozra commented 8 years ago

Yes ' as an alternative for suffix-prefixing is a good idea. Apostrophes are currently used:

to explicitly denote a type annotation (in certain contexts, but can always be used)
as prefix to pragmas

They would work fine both conceptually (pragmaish/typeish) and syntactically (given they're not spaced from the literal, which of course is the role it should play, so all fine)

I meant underscore in the text, edited.

ozra commented 8 years ago

The heavy part of it (expansion) is now implemented, including "crox"-syntax so it works in crystal macros too. Very alpha. The definitions are currently defined via the regular template syntax simply name mangled as such: template suffix_number_MYSUFFIX(v) =...

ozra commented 8 years ago

I will be going for the initial proposed syntax for defining suffixes now, but with full template-body syntax for greater versatility. If a specific literal kind (int-looking or real-looking) wants to be matched, it's typed as in a function param with either IntLiteral or RealLiteral, these are not existing types though, so just part of parsing syntax:

suffix (val)kg =
   Gram {= val =} * 1000r

suffix (val 'IntLiteral)foo =
   Foo<Int> {= val =}

suffix (val 'RealLiteral)foo =
   Foo<Real> {= val =}

Since most suffixes are likely to render a type which is real-ish, most will want to match both integer- and real-literal. This is the default, so most will be simply defined as suffix (val)foo = ...

As to hex-numbers, user-suffixes will not be handled, I can't think of a reasonable use case, and as mentioned there are to many lexical gotchas for that, so a needed __ (ugly) or ' (which after pondering should not be used for literals - it would be better off continued as type-annotation and pragma prefix only - then it can actually be used for as if wanted, or some other type related purpose. 5 'Number => 5 as Number, @foo = SomeThing() 'GeneralThing => @foo = SomeThing.new as GeneralThing)

Regarding "default suffixes" (raw literals without suffix), number-literals aren't common enough in code that it should pose a performance problem, I'll therefor add the ability to define default-suffixes and code it up so it can be turned on and off for benchmarking. Iff there should be a perf hit, they can be skipped / re-thought.

The default definitions for these are simply to add explicit intrinsic unspec_int and unspec_real suffix respectively. These suffixes will have the special ability to be reverse-inferred depending on target-type: foo(x I8) -> say x; foo 47 - should work without out user-effort. 47 => 47_unspec_int => resolve call: no matching signature, check avail signatures: if int-subtype, order sigs in priority order (say StdInt preferred, then I32, I64...) infer 47 to 47_i8 (in this case). All good and dandy, works as user intended. unspec_* is normally never used in user code so it can be this ugly without problem.

Comments?

stugol commented 8 years ago

I don't really follow what you're saying about "default suffixes".

ozra commented 8 years ago

Number-literals without suffix are processed through the exact same expansion chain.

a = 5
b = 3.1

With user-defined default-suffixes and using-clause it could be used as:

BigCalculations:
  suffix (v IntLiteral) = BigInt("{= v =}")
  suffix (v RealLiteral) = BigReal("{= v =}")

using BigCalculations
   x = 432342654365434532 + 2342365536456435324324.43242343233
   y = x / 2309809258238748343

resulting "canonical" code:

using BigCalculations
   x = BigInt.new("432342654365434532") + BigReal.new("2342365536456435324324.43242343233")
   y = x / BigInt.new("2309809258238748343")

Preferably the whole resultant expression should be flagged internally with implicit_from_unspec_int or ...real - so that reverse-inference for indexing etc. can be used even in this scenario without having to close/re-open using-clause.

ozra / onyx-lang

User Defined Literal Suffixes #76