lmfit / uncertainties

Transparent calculations with uncertainties on the quantities involved (aka "error propagation"); calculation of derivatives.
http://uncertainties.readthedocs.io/
Other
579 stars 74 forks source link

Trailing decimal symbol test failing #259

Open jagerber48 opened 1 month ago

jagerber48 commented 1 month ago

167 refactors the formatting tests to resolve #162, a bug where shadowed dictionary keys caused many previously written tests to not run. When these tests ran a few failures were discovered. The only unresolved failure was on the following test (abbreviated here):

assert format(ufloat(1234.56789, 0.1), ".0f") == "1235+/-0."

This fails because the formatting gives

print(format(ufloat(1234.56789, 0.1), ".0f"))
# 1235+/-0

The tests expects a trailing decimal symbol on the zero, but the actual output has no trailing decimal symbol.

The RTD documentation claims that "An uncertainty which is exactly zero is always formatted as an integer". To me this implied that an uncertainty which is rounded to zero, but which was originally non-zero, would always have a trailing decimal symbol. This is the case on another test which is passing now:

assert format(ufloat(9.9, 0.1), ".0fS") == "10(0.)"

There is some discussion in the comments in the source code about this issue. The comments allude that it is challenging, using the built in python formatting, to always get a trailing decimal.


I'm not sure what the resolution should be. Here are the options I see:

Probably short term the third option will be the path.


I'd like to comment on this as the author/maintainer of sciform, a package dedicated to scientific formatting of numbers. sciform tries to follow official SI/NIST/BIPM guidelines rather than existing programming conventions for formatting e.g. integers vs. floats. The first major decision this has lead to is that sciform only supports significant figure rounding, and not digits-past-the-decimal formatting. Since the number of sig figs on the uncertainty must be >= 1, this means it is impossible for a non-zero uncertainty to get rounded down to zero so this is issue is sidestepped. This is consistent with the guideline examples which all have non-zero uncertainty and always display the value and uncertainty out to the same least significant decimal place.

However, in practice users will pass in uncertainties that are exactly zero and, in my opinion, it is better to still format that as a value/uncertainty pair rather than just a value. In this case the uncertainty is never displayed with a hanging decimal symbol because there is no official example showing a trailing decimal symbol. I haven't found official guidance against this but there is 10.5.2 which guides against leading decimal points, i.e. "0.25" over ".25".

If a zero uncertainty is passed in then it is formatted as "0", "000", "0.000", "00.00" etc. depending on how the value is displayed and other options selected. It will always be rendered to the same least significant digit as the value controlling the number of zeros to the right of the decimal symbol, and if a certain option is selected it will be rendered to the same most significant digit as well, controlling the number of zeros to the left of the decimal symbol.

Due to popular demand, I will likely add a digits-past-the-decimal formatting style for value/uncertainty formatting in sciform. Though, for the reasons above, I will encourage the user not to use this option. In this mode, I will render "rounded-to-zero" uncertainty the same as "exactly-zero" uncertainty. I disagree with the convention that there should be an indication to the user whether the uncertainty was rounded to zero or is exactly zero. If the user chooses to use digits-past-the-decimal formatting then they expose themselves to selecting a most-significant displayed digit of the uncertainty that is greater the most-significant digit of the uncertainty and they opt-in to losing information about the uncertainty.

It seems to me that the uncertainties convention arose out of the programming convention that a trailing decimal indicates a float whereas no decimal symbol indicates an int. However, scientific formatting is entirely unconcerned with floats and ints. Scientific formatting has a value and uncertainty, both of which are real numbers. The main formatting decision is to round both the uncertainty and value to typically the decimal place matching the most- or second-most-significant digit of the uncertainty.

If uncertainties moves to using sciform as a formatting back-end then the trailing decimal symbol convention will be abandoned, essentially corresponding to the first option above of changing the test expected output on this particular test.

newville commented 1 month ago

@jagerber48 A few comments.

I will not try to keep up with the length of your messages. When you write messages this long, or multiple replies to your own comments (as at #251), you should not assume that anyone is reading everything you write.

I agree that "integer 0 meaning no uncertainty, with float 0. meaning small uncertainty" is kind of weird. I think "precisely no uncertainty" would be better spelled None. That would need special handling, but also add clarity. I am OK with leaving it as it is ("0" or "0."). I am -1 on expecting to maintain the distinction between "0" and "0.".

I don't disagree with anything in sciform or the NIST/BIPM recommendations. But: those are about reporting in a publication. What this library prints out are intermediate results that can be turned into publishable results. These can certainly report more digits than are recommended, expecting the user to prepare for publication.

I would suggest that uncertainties focus more on calculations that propagate uncertainties (hard enough!), and less on formatting of values with uncertainties.

I am OK with making sciform an optional dependency and using that if available, or otherwise just print out with a "%.Ng" formatting (and not worry about leading or trailing zeros).

jagerber48 commented 1 month ago

@newville thanks for the response.

Sorry my messages are long. I have lots of thoughts and ideas and they're pretty "specific" in my mind so I try to be clear about them. But I can appreciate how the long verbose posts are not conducive to easy collaboration. I'll work to spend more time whittling down my message to be more concise and digestible so we can move forward more easily.

Regarding your comments on the trailing decimal symbol:

It's helpful to know you're -1 on maintaining the distinction between "0" and "0.". I'm also -1 on that. Dropping the distinction will solve this issue and could generally make the formatting algorithm more clear in code and documentation.

Your point is taken that maybe uncertainties need not provide functionality for publication ready formatting. That can be the job of something like sciform. Rather, uncertainties need only provide quick readability for the programmer. I'll take this into consideration

I would suggest that uncertainties focus more on calculations that propagate uncertainties (hard enough!), and less on formatting of values with uncertainties.

agreed.

I am OK with making sciform an optional dependency and using that if available, or otherwise just print out with a "%.Ng" formatting (and not worry about leading or trailing zeros).

This statement is a bit of a can of worms for me. There's a lot I'd like to discuss on it, but I'll pick that up at #192 as I have time. Maybe for now I'll express regret about bringing up the sciform topic in this issue and ask that the thread be re-focused on the specific question:

What should be done about "0" vs. "0." in the context of this specific tests but also generally for uncertainties formatting.

So far the vote is to stop worrying about the difference between 0 and 0., so in this case it would be rewrite the test so it passes and make sure the documentation is still sufficiently clear.

newville commented 1 month ago

@jagerber48 For formatting, I guess I would say

a) uncertainties should not have to worry about following the NIST/BIPM recommendations, but try to do a decent job at providing at least enough digits. b) using formats like "%.df", "%.de", or "%.de" are not perfect but ought to be okay for most things and kind of "normal Python". So, I think those are reasonable defaults for what uncertainties should do.
c) I think the choices of val(std), val+/-std, val±std, ${val}\pm{std}$, (val±std)×exp, and so forth is sort of a distraction in this code that would be better somewhere else (say, sciform). d) I am +1 on adding sciform as an optional dependency and have uncertainties use that if available, and also recommending using it for real reporting even when not installed.

Yes, I would be +1 on dropping any distinction between "0" and "0." in the tests, docs, and code. I think that is too hard to maintain reliably, and does not really add much meaning anyway. If one wanted to reporting "the uncertainty is not only numerically vanishingly small, but precisely 0 because there is no uncertainty", then maybe std should be None. At least, in reporting ;).