open-reaction-database / ord-schema

Schema for the Open Reaction Database
https://open-reaction-database.org
Apache License 2.0
95 stars 27 forks source link

Round floating point values for display #392

Closed skearnes closed 4 years ago

skearnes commented 4 years ago

Fixes #229

codecov[bot] commented 4 years ago

Codecov Report

Merging #392 into main will not change coverage. The diff coverage is 50.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #392   +/-   ##
=======================================
  Coverage   77.76%   77.76%           
=======================================
  Files          21       21           
  Lines        2249     2249           
  Branches      469      469           
=======================================
  Hits         1749     1749           
  Misses        364      364           
  Partials      136      136           
Impacted Files Coverage Δ
ord_schema/units.py 90.47% <50.00%> (ø)
connorcoley commented 4 years ago

This isn't quite right IMO. It could be a fine patch for this issue, but we should really have a more sophisticated way of doing this based on the number of digits in the significand, regardless of the exponent.

For instance, you can put in "0.00000000000001" and it will be stored as "1e-14". It will then be read into the form as "9.9999998245167e-15" if you roundtrip it to a pbtxt. The pbtxt doesn't always store things to 7 decimal places.

This also raises a new issue for my hacky float/integer check (#377), since we can now have exponents in the field denoted by the letter e.

skearnes commented 4 years ago

This isn't quite right IMO. It could be a fine patch for this issue, but we should really have a more sophisticated way of doing this based on the number of digits in the significand, regardless of the exponent.

For instance, you can put in "0.00000000000001" and it will be stored as "1e-14". It will then be read into the form as "9.9999998245167e-15" if you roundtrip it to a pbtxt. The pbtxt doesn't always store things to 7 decimal places.

This also raises a new issue for my hacky float/integer check (#377), since we can now have exponents in the field denoted by the letter e.

Hmm...the proto definition uses float, which only supports 32-bit floats, so "0.00000000000001" is technically invalid. However, as you mention, text_format does the right thing on serialization (value: 1e-14) but loads the borked version when you deserialize. This could be fixed by treating these as doubles when rounding (more precision) but that will break the fix for things with fewer decimal places.

So I see a few options here:

  1. Disallow values with more precision than the underlying protocol buffer definition (7 decimal digits).
  2. Treat all floating point values as doubles and go out to 15 digits for rounding.
  3. Something more sophisticated like trying both roundings and seeing which one gives the cleanest value...

FWIW I think option 1 is probably best.

skearnes commented 4 years ago

PS <input type="number"... allows 'e' but not other non-numeric characters.

skearnes commented 4 years ago

I can't think of a way to do this dynamically based on the number of significant digits, since we'll lose that information when we try to round trip (it comes into the loaded protocol buffer incorrect). Am I missing something?

skearnes commented 4 years ago

On second thought, I think I agree with you about significant digits. Let me dig deeper.

skearnes commented 4 years ago

It looks like javascript's toPrecision method should be part of the answer. Possibly coupled with rounding...

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_objects/Number/toPrecision

skearnes commented 4 years ago

PTAL at the latest commits that use up to 15 significant digits.

skearnes commented 4 years ago

So I got things working for 1e-14, but broke 0.04 since it doesn't round to 0.04 with 15 digits of precision.

skearnes commented 4 years ago

Oh I think I figured it out; just use parseFloat instead of Math.round, in combination with toPrecision.

connorcoley commented 4 years ago

Am I missing something?

I don't think so. I've had a surprisingly hard time trying to reason my way through possible solutions to this problem. It doesn't seem like it should be that hard.

skearnes commented 4 years ago

image

and after reload:

image

connorcoley commented 4 years ago

That behavior looks good to me. There's no red outline in your example, but if you change the field and trigger the checkFloat function, it will complain and then not let you successfully validate the reaction. How should we change the test? Just look for round-trip consistency w/ prepareFloat(parseFloat(node.text())) ?

skearnes commented 4 years ago

That behavior looks good to me. There's no red outline in your example, but if you change the field and trigger the checkFloat function, it will complain and then not let you successfully validate the reaction. How should we change the test? Just look for round-trip consistency w/ prepareFloat(parseFloat(node.text())) ?

The test still passes as-is, since we never call checkFloat, but I agree that is brittle. I suggest we fix checkFloat to allow for scientific notation (my thought is to make it recursive and split on 'e') or move to <input type="number"... for these fields.

If it's ok with you I'd prefer to check this in separately from fixing this issue since it's currently passing.

connorcoley commented 4 years ago

I was going to suggest the opposite and say we should introduce a test that triggers a change/blur event for all fields, re-validates a known-to-be-valid reaction, and ensures there are no validation errors.

I don't think we should merge this one in w/o fixing the checkFloat issue, since we would have a situation where a user would be unable to record a reaction exactly as the form is trying to display it. For the purposes of this PR, I might suggest you just disable checkFloat and checkInteger temporarily. We can re-enable them when we fix the exponent issue.