scikit-hep / awkward

Manipulate JSON-like data with NumPy-like idioms.
https://awkward-array.org
BSD 3-Clause "New" or "Revised" License
827 stars 85 forks source link

Support for units #2468

Open HealthyPear opened 1 year ago

HealthyPear commented 1 year ago

Description of new feature

(Apologies if this is already supported in some way I didn't find it in the docs or in the suggested links when opening the issue)

I am working with a data format in which each event is a row in a table and it's a ragged array.

For context, this is how each event is written to file:

As you can guess:

I am currently opening this file as a dictionary of numpy arrays and building 3 astropy QTables: "event_table", "A_table", "B_table". Of course from there one can play around with masking, filtering, etc...so my (immediate) use case is kind of fixed.

Said this, I cannot stop thinking that one of these files is basically an " awkward table"!

At the same time:

Soooo....what about adding units, maybe compatible with astropy quantities or with a similar object? 😄

jpivarski commented 1 year ago

This sounds like a good use-case for #1391. That would propagate units, though it would not convert them.

Or maybe it could be some custom behaviors? https://awkward-array.org/doc/main/reference/ak.behavior.html That would even make it possible to convert appropriately. (The NumpyArray nodes would have two parameters, __array__: "units" and __units__: "light year" and there would be a custom ak.Array subclass with some mathematical operations overloaded, such as np.add—maybe only addition.)

I can try to write a prototype. This may be complex enough that we should build it in, rather than making users implement it—so the new feature would be a new set of behaviors in src/awkward/behaviors. It would be the only __array__ behaviors that are not string-like, and @agoose77 and I were trying to think if there would be anything like that.

agoose77 commented 10 months ago

@gpiert thanks for opening #2788 regarding this!

It's planned that we add support for units through pint. We're working our way through features to get there :)

jpivarski commented 10 months ago

Cross posted from https://github.com/scikit-hep/awkward/issues/2788#issuecomment-1787324956:

That was probably in private conversations, then: we're thinking of using a Pint UnitRegistry as a source of truth about units and their relationships, but some of the handling would have to be manual. (For example, we have to implement reducers ourselves. If an array has units, ak.sum would preserve those units but ak.prod shouldn't even be possible. ak.any and ak.all would drop the units when converting numbers into booleans...)

Thus, we're recognizing Pint as the standard way to express units, to the exclusion of any other libraries that might do the same thing, and we'll try to reuse code in Pint as much as possible (e.g. in unit conflicts, which of the two should be converted to the other, and what do we multiply by to get that conversion?), but there will be limits and some things will need to be computed by hand in Awkward.