Closed rbeyer closed 3 years ago
I think that using the decimal objects is the correct way to handle this situation, rather than passing strings around, and I think the implementation makes senses although maybe defaulting to decimal is actually a safer choice than float in all situations, just to be defensive?
I thought about changing the default, too, since I think their behavior is more aligned with what users expect. I haven't worked with Decimal
objects enough to know what the implications of doing that would be if users suddenly started getting Decimal
objects instead of float
objects in their returned dict-like. My impression is if a user was expecting a float
and had code that assumed that, and got a Decimal
instead, it would function identically in all cases, although an isinstance would fail:
>>> import decimal
>>> blah = decimal.Decimal("1.1")
>>> print(isinstance(blah, float))
False
Not sure if that's a big deal or not, so my thought is to make it an option for now, and maybe change it with pvl
2.0 if we ever accumulate enough significant changes.
I agree the user shouldn't be surprised by some eclectic object, even if it's helpful for us "librarians". That a powerful object description exist doesn't mean we should impose it on the user, specifically if we only want to solve clean serialization and no other user cases. However, it could be transparently converted with properties. But then again we might as well decide to keep the string form for simplicity and convert that transparently with a property as well. I do that in one of my libraries for precisely that reason. I want the float for math, but I need to keep track of the precise string float I read in. I'd store that in self._value and the property getter converts it to float while your back-serializer can simply read off self._value directly. I don't know though if that'd work well with your by now rather complex system in pvl.
Sure. That kind of approach would work, we'd need to build a mechanism that populates the returned dict-like with these other objects that have these extra properties, and then also takes a dict-like of them and can properly serialize them back out to PVL text.
This is exactly what we already do with quantities, and what I'm proposing we also now do with "PVL Real values." We allow the user to request a different object than Python float
to return in the dict-like for "PVL Real values." One could take this to its logical conclusion and have a similar mechanism for each type of "PVL value" (rather than just quantities and real numbers). And that type of object could keep the original string representation that was read in, sure.
So I guess just making this change for "PVL Real" values is another step on that road to allow custom classes to be returned in the dict-like from the loaders.
Sometimes "PVL Real Numbers" like 1234.560 are provided in PVL-text. The current
pvl
library reads such a "PVL Real Number" duringpvl.load()
and converts it to a Pythonfloat
. If that value is then written back out to PVL text bypvl.dump()
then it will be written 1234.56. While the two values are numerically equivalent, the number has lost some "representational precision" that may be significant. The PVL/ODL/PDS3 specifications don't care about this, but people do.Likewise, floating point arithmetic can result in Python
float
objects which have an unreasonable precision that would get written to output PVL-text. For example, 1.1 + 2.2 via a binary floating point addition could display as 3.3000000000000003. This is generally handled in Python during conversion of individualfloat
objects tostr
objects by string formatting, but since many such numbers with different levels of needed precision could be written out when a dict-like is handed topvl.dump()
specifying a one-size-fits all precision for a wholepvl.dump()
run is unreasonable.Describe the solution you'd like If "PVL Real Numbers" were decoded to Python
decimal.Decimal
objects instead offloat
objects, they could retain their precision.The
pvl
decoder classes currently have aquantity_cls
parameter on instantiation that instructs the decoder what kind of quantity class should be used to decode a PVL numeric value with associated units, and specifies the kind of objects returned in the dict-like that match that kind of PVL text. Seems like we could create a similarreal_cls
parameter for decoders that would default to Pythonfloat
, but for which Pythondecimal.Decimal
could be used (or anything else that could take a text representation of a PVL Real Number and convert it to some Python object), which would returnDecimal
objects in the dict-like, and then when a dict-like with such objects is written out viapvl.dump()
it could retain the relevant precision. Such a mechanism would also allow programmers to use various operations on theDecimal
objects to specify individual amounts of precision for different values that would then be used when the dict-like was written out viapvl.dump()
.Describe alternatives you've considered An alternative would be to have some mechanism for just keeping all parsed values as Python
str
so that they "stay the same" during processing, but then extra work would need to be done on output duringpvl.dump()
to properly convert all of the Pythonstr
which aren't all really PVL Text Strings to the appropriate representation.Ironically, it occurs to me that you could provide Python
str
to the hypotheticalreal_cls
attribute described above to do exactly that during thepvl.load()
step, but then would have to perhaps write a custom encoder that would make sure to take these strings-that-are-really-floats and serialize them to PVL-text that didn't enclose these strings-that-are-really-floats in quotation marks.