seandstewart / typical

Typical: Fast, simple, & correct data-validation using Python 3 typing.
https://python-typical.org
MIT License
183 stars 9 forks source link

Decimal serialization and deserialization #188

Closed qhelix7 closed 2 years ago

qhelix7 commented 2 years ago

Description

I am looking for a Python library that simplifies serialization between CSV / JSON data and dataclasses. I really like the approach you have taken with Typical, however, I am having trouble with Decimal data because it gets coerced to float.

In my experience, when I use a Decimal type it is because float is unacceptable -- not less-than-ideal, but actually unusable:

  1. Cryptography -- numbers have dozens or even hundreds of significant digits, but floating-point only supports about 16
  2. Currency -- numbers and math require decimal fixed-point and there are too many rounding errors when trying to use binary floating-point
  3. *Cryptocurrency -- both of the above

Decimal (BigInteger, BigNumber, Numeric, etc.) values are usually serialized to strings for these reasons -- at least for human-readable formats such as JSON and CSV. Some sort of binary representation could be used, but casting to traditional primitive types like float, double, int64, etc. defeats the purposes of using Decimal.

What I Did

from dataclasses import dataclass
from decimal import Decimal
import typic

@dataclass
class DecimalData:
    value: Decimal

proto = typic.protocol(DecimalData)

When deserializing floating-point numbers to Decimal, rounding errors are to be expected, but this also happens when the input data is a string:

proto.deserialize({"value": 1.2})
# returns DecimalData(value=Decimal('1.1999999999999999555910790149937383830547332763671875'))

proto.deserialize({"value": "1.2"})
# also returns DecimalData(value=Decimal('1.1999999999999999555910790149937383830547332763671875'))

proto({"value": "1000000000000000.3"})
# returns DecimalData(value=Decimal("1000000000000000.25"))

Deserialization works for big numbers without a decimal point, but breaks for serialization due to internal float coercion:

data = proto({"value": "10000000000000005"})
# data is DecimalData(value=Decimal('10000000000000005'))

proto.primitive(data)
# returns {'value': 1.0000000000000004e+16}

The main issue is that the generated code is doing an eval for string values which causes Python to interpret them as int or float literals before the code has a chance to convert them to Decimal.

print(typic.protocol(Decimal).transmute.__raw__)

def deserializer_2831300019841999574(val):
    _, val = __eval(val) if isinstance(val, (str, bytes)) else (False, val)
    vtype = val.__class__
    if vtype is Decimal_140183350269856:
        return val
    # Happy path - deserialize a mapping into the object.
    if issubclass(vtype, Mapping):
        val = Decimal_140183350269856(**{x: desers[x](val[x]) for x in fields_in.keys() & val.keys()})
    # Unknown path, just try casting it directly.
    elif isbuiltinsubtype(vtype):
        val = Decimal_140183350269856(val)
    # Two user-defined types, try to translate the input into the desired output.
    else:
        val = translate(val, Decimal_140183350269856)
    return val

The eval also makes it problematic to try to do something like define a custom type that subclasses Decimal.

I have a pull request to address these issues. Let me know what you think.

seandstewart commented 2 years ago

@qhelix7 -

The eval portion of the deserializer is quite an old bit of code - some of the first ever written for this library. I'm more than happy to try another approach, as it has its drawbacks, which you so clearly illustrate here!

If you have a PR with a proposed solution that solves these cases you mention here then I would love for you to submit it. I'll keep an eye out for your submission 👍

qhelix7 commented 2 years ago

Wow, I was not expecting such a quick reply! Here's the PR.