planetarypy / pvl

Python implementation of PVL (Parameter Value Language)
BSD 3-Clause "New" or "Revised" License
19 stars 19 forks source link

Retain precision on PVL Real Numbers #81

Closed rbeyer closed 3 years ago

rbeyer commented 3 years ago

Sometimes "PVL Real Numbers" like 1234.560 are provided in PVL-text. The current pvl library reads such a "PVL Real Number" during pvl.load() and converts it to a Python float. If that value is then written back out to PVL text by pvl.dump() then it will be written 1234.56. While the two values are numerically equivalent, the number has lost some "representational precision" that may be significant. The PVL/ODL/PDS3 specifications don't care about this, but people do.

Likewise, floating point arithmetic can result in Python float objects which have an unreasonable precision that would get written to output PVL-text. For example, 1.1 + 2.2 via a binary floating point addition could display as 3.3000000000000003. This is generally handled in Python during conversion of individual float objects to str objects by string formatting, but since many such numbers with different levels of needed precision could be written out when a dict-like is handed to pvl.dump() specifying a one-size-fits all precision for a whole pvl.dump() run is unreasonable.

Describe the solution you'd like If "PVL Real Numbers" were decoded to Python decimal.Decimal objects instead of float objects, they could retain their precision.

The pvl decoder classes currently have a quantity_cls parameter on instantiation that instructs the decoder what kind of quantity class should be used to decode a PVL numeric value with associated units, and specifies the kind of objects returned in the dict-like that match that kind of PVL text. Seems like we could create a similar real_cls parameter for decoders that would default to Python float, but for which Python decimal.Decimal could be used (or anything else that could take a text representation of a PVL Real Number and convert it to some Python object), which would return Decimal objects in the dict-like, and then when a dict-like with such objects is written out via pvl.dump() it could retain the relevant precision. Such a mechanism would also allow programmers to use various operations on the Decimal objects to specify individual amounts of precision for different values that would then be used when the dict-like was written out via pvl.dump().

Describe alternatives you've considered An alternative would be to have some mechanism for just keeping all parsed values as Python str so that they "stay the same" during processing, but then extra work would need to be done on output during pvl.dump() to properly convert all of the Python str which aren't all really PVL Text Strings to the appropriate representation.

Ironically, it occurs to me that you could provide Python str to the hypothetical real_cls attribute described above to do exactly that during the pvl.load() step, but then would have to perhaps write a custom encoder that would make sure to take these strings-that-are-really-floats and serialize them to PVL-text that didn't enclose these strings-that-are-really-floats in quotation marks.

AndrewAnnex commented 3 years ago

I think that using the decimal objects is the correct way to handle this situation, rather than passing strings around, and I think the implementation makes senses although maybe defaulting to decimal is actually a safer choice than float in all situations, just to be defensive?

rbeyer commented 3 years ago

I thought about changing the default, too, since I think their behavior is more aligned with what users expect. I haven't worked with Decimal objects enough to know what the implications of doing that would be if users suddenly started getting Decimal objects instead of float objects in their returned dict-like. My impression is if a user was expecting a float and had code that assumed that, and got a Decimal instead, it would function identically in all cases, although an isinstance would fail:

>>> import decimal
>>> blah = decimal.Decimal("1.1")
>>> print(isinstance(blah, float))
False

Not sure if that's a big deal or not, so my thought is to make it an option for now, and maybe change it with pvl 2.0 if we ever accumulate enough significant changes.

michaelaye commented 3 years ago

I agree the user shouldn't be surprised by some eclectic object, even if it's helpful for us "librarians". That a powerful object description exist doesn't mean we should impose it on the user, specifically if we only want to solve clean serialization and no other user cases. However, it could be transparently converted with properties. But then again we might as well decide to keep the string form for simplicity and convert that transparently with a property as well. I do that in one of my libraries for precisely that reason. I want the float for math, but I need to keep track of the precise string float I read in. I'd store that in self._value and the property getter converts it to float while your back-serializer can simply read off self._value directly. I don't know though if that'd work well with your by now rather complex system in pvl.

rbeyer commented 3 years ago

Sure. That kind of approach would work, we'd need to build a mechanism that populates the returned dict-like with these other objects that have these extra properties, and then also takes a dict-like of them and can properly serialize them back out to PVL text.

This is exactly what we already do with quantities, and what I'm proposing we also now do with "PVL Real values." We allow the user to request a different object than Python float to return in the dict-like for "PVL Real values." One could take this to its logical conclusion and have a similar mechanism for each type of "PVL value" (rather than just quantities and real numbers). And that type of object could keep the original string representation that was read in, sure.

So I guess just making this change for "PVL Real" values is another step on that road to allow custom classes to be returned in the dict-like from the loaders.