Unit conventions - Githubissues

markbandstra commented 7 years ago

Following up on our discussion about using pint, I looked a bit more into its usability in our ecosystem. It seems very tightly coupled with numpy and uncertainties to the point where using it is nearly transparent, but its integration with pandas is poor. Quantities with both uncertainties and units can be stored in pandas data structures, but they need to be handled one-by-one instead of as an entire DataFrame or Series. (This is a known issue with pandas not supporting different methods of incorporating units.)

This could be a deal-breaker if we decide to rely on pandas heavily in this project.

I, for one, am not a big pandas user so the cost-benefit of using pint is weighed heavily toward benefit. For example, is this spectrum in units of counts, or counts per second, or counts per second per keV? Is this branching ratio a percentage or a dimensionless number? I have an activity in Becquerels; how do I do the conversion to mCi again?

markbandstra commented 7 years ago

Here is a script you all can try that uses pint for various types of quantities, please give it a try.

bplimley commented 7 years ago

I'm leaning toward not wanting to use pint, instead relying on clear conventions, clear variable names, and good testing.

A similar question, though, is whether we want to use uncertainties for uncertainties? It could save a lot of manual error propagation which has potential for bugs. The API doesn't have to rely on it, I think, but we could use uncertainties under the hood.

markbandstra commented 7 years ago

I am a huge fan of uncertainties. If you're just using gaussian error propagation it takes care of all that for you. If you're doing something fancier (e.g., asymmetric error bars) you would want to write your own stuff anyway.

bplimley commented 7 years ago

I did implement pint in an electron range module that had some conversions between energies, lengths, densities, and mass thicknesses, in order to try pint for myself. It is elegant in some ways, but other things I don't like (see below). So I still vote that we avoid using pint.

(Specifically:

if you divide a keV quantity by a MeV quantity, the dimensions don't automatically cancel, you need to apply a method like to_base_units().
you're not supposed to use quantities from different unit registries (1 bottom of page), which means that it's hard for the user to give an input arg with units, because the user will be working from a different unit registry)

markbandstra commented 7 years ago

I have become convinced that pint is too much of a hassle for our purposes. Perhaps in the future it might have some use, but I agree that we should just use conventions and testing. I have removed the pint dependency from the xcom branch.

My proposal is:

All energies will be in keV
All other units are in CGS
Any special unit cases to be handled on a case-by-case basis

bplimley commented 7 years ago

(CGS? You're such an astrophysicist...) I'm fine with CGS. Most of the lengths we work with are small (order of the size of detectors), and if densities are involved then g/cm3 make sense.

Until you suggest we use ergs for something. There I'll draw the line. ;-)

bplimley commented 7 years ago

What about variable naming convention, do we still suffix energy variables with _kev, lengths with _cm, etc.?

markbandstra commented 7 years ago

Good question, I am worried that appending units to variable names could end up making things too verbose, but I do see the utility of doing that. What do others think?

bplimley commented 7 years ago

One possible compromise would be to append unit names onto publicly accessible arguments and properties, and leave them off (or optional) on internal variables and attributes.

markbandstra commented 7 years ago

Can re-close this issue? I think we have decided to prioritize simplicity over using pint through a combination of clear conventions, good documentation, and subscript hints.

bplimley commented 7 years ago

I only hesitated because I wasn't very clear on when to name a variable with e.g. _kev and when not to. But I'm okay closing it and revisiting the variable naming once we have more code to have a feel for it.

markbandstra commented 7 years ago

Sounds good.

lbl-anp / becquerel

Unit conventions #12