yunruse / Noether

Just another units package
MIT License
9 stars 0 forks source link

Catalogue file format #46

Open yunruse opened 1 year ago

yunruse commented 1 year ago

Take the following (slightly contrived) unit definition:

pascal = Pa = SI(I(Unit(pressure, 'pascal', "Pa", SI_large, info="Unit of pressure")))

This is… bleh. There is a lot of annoyance for making large catalogues, however, and I'm beginning to feel the pain – especially when I'm in the weeds making units and annoyed at having to do Python grammar, or (especially) repeat the name.

My concept is this: we have a custom file format.

Interpreting this will not be trivial. However, I believe it wise to prioritise investigating this concept before any additional major cataloguing efforts.

File format concept

Prefix definition

TODO.

PrefixSet definition

Dimension definition

TODO. Ideally we could invent the meter directly from nothing.

Units definition

The following information needs to be encapsulated:

One example syntax, which at least works with Python syntax highlighters, is:

@standard_gravity = 'g' = 9.980665(1) * meter / second**2
    [SI_large] #SI # defined by convention

where, here, a newline followed by indentation is treated as a line continuation.

Implementation notes

I think it would be useful to keep an file e.g. data.py which while not used is an example of how to create Units and Dimensions from Python rather than from file.

Implementing this will mean all units are loaded from file, which will annoy type checkers. In order to fix this, we generate one huge ol' .pyi typed stub file.

Related issues

If implemented, this will greatly enable the following:

It may also enable:

yunruse commented 1 year ago

Thanks to some help from friends I've realised that in addition to parsing from scratch, I could also consider:

yunruse commented 1 year ago

Major progress update! This is kinda nearly-done on the catalogue branch. I've decided a .yml - essentially just a JSON with a few niceties - is the way to go to avoid lexing headaches.

I'm a huge fan of this new syntax - it's a little more terse, but it's far easier to validate.

The code to render to a single catalogue.py is pretty much complete, though units aren't yet ordered in a way that avoids NameErrors. I'd I'd be lying if I said I wasn't concerned with how hacky the code is, though.

yunruse commented 1 week ago

The complete YAML file format is overkill and pedantic. With some very carefully defined regexes there is little problem. Consider a file that is simply:

from ..fundamental import meter as m, second as s
from ..conventional import tonne as t

#% mts "Meter-tonne-second system"
# Similar in nature to SI and CGS,
# used historically in France and the Soviet Union.
# wikipedia: https://en.wikipedia.org/wiki/MTS_units

m
t
s

stere = stère = 'st' = m**3
# used in firewood measurement
# etymology: Greek stereós, "solid"

sthene = sthène = sthéne = 'sn' = t*m/s**2
# etymology: Greek sthénos, "force"

pieze = pièze = 'pz' = sthene / m**2

The above code-highlights and autocompletes, and the fresh change is the tagging system, which is entirely just a dict of strings. It's not the most strictly defined — but it's also of very limited scope so in the grand scheme of things, so no need for pedantry.

Should be

jansky = 'Jy' = 1e-26 * watt / meter**2 / hertz
# used in radio astronomy
# dimension: spectral_flux_density, spectral_irradiance