pydantic / pydantic

Data validation using Python type hints
https://docs.pydantic.dev
MIT License
20.45k stars 1.83k forks source link

Decoding Decimal from JSON number is lossy #9180

Open FeldrinH opened 5 months ago

FeldrinH commented 5 months ago

Initial Checks

Description

When decoding a JSON number into a Python Decimal the precision seems to be limited. After a certain number of digits the value is cut off. This is something I would expect for a fixed precision float, but not for an arbitrary precision Decimal. This only happens with numbers that contain a decimal point. I assume what happens is that the number is internally converted to a float and then a Decimal.

As a user, this lossy internal conversion is an unexpected and unwelcome surprise. Both the initial JSON string and the final Decimal can contain the full precision of the value without loss, so I would have expected the conversion to be lossless as well.

PS: I'm not sure if this is a bug per se because I could not find any documentation that explicitly states what the expected behavior is. However, based on what was written in the documentation it certainly was unexpected to me.

Example Code

from decimal import Decimal
from pydantic import BaseModel

class Test(BaseModel):
    value: Decimal

print(Test.model_validate_json('{"value": 1.234567890123456789012345678901234567890}'))
# Expected output: value=Decimal('1.234567890123456789012345678901234567890')
# Actual output: value=Decimal('1.2345678901234567')

print(Test.model_validate_json('{"value": 12345678901234567890123456789012345678.9}'))
# Expected output: value=Decimal('12345678901234567890123456789012345678.9')
# Actual output: value=Decimal('12345678901234568000000000000000000000')

Python, Pydantic & OS Version

             pydantic version: 2.6.4
        pydantic-core version: 2.16.3
          pydantic-core build: profile=release pgo=true
                 install path: <redacted>
               python version: 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
                     platform: Windows-10-10.0.19045-SP0
             related packages: fastapi-0.109.2 mypy-1.3.0 typing_extensions-4.8.0
                       commit: unknown
FeldrinH commented 5 months ago

PS: There are a number of similar issues with Decimal decoding that are marked as resolved (https://github.com/pydantic/pydantic/issues/6807, https://github.com/pydantic/pydantic/issues/6295). As far as I can tell they are distinct from this issue (most importantly, those issues are resolved whereas this issue is present in the latest Pydantic version).

sydney-runkle commented 5 months ago

@FeldrinH,

Thanks for reporting. Definitely looks like a bug, and I'm guessing will have to be fixed in pydantic-core. Adding this to our 2.8 milestone, and marking as a good first issue for anyone interested!

ybressler commented 4 months ago

I can take this one.

ybressler commented 4 months ago

As an aside, the following test cases pass:

@pytest.mark.parametrize(
    'value',
    [
        Decimal(1.234567890123456789012345678901234567890),
        Decimal(12345678901234567890123456789012345678.9),
        Decimal(1) / Decimal(7)
    ]
)
def test_long_decimal_decoding(value: Decimal) -> None:
    """
    Really large decimal values should not be lost when encoding or decoding from json (or other input formats).
    """

    class Obj(BaseModel):
        value: Decimal

    m = Obj.model_validate_json(json.dumps({"value": value.real}, default=str))
    assert m.value.real == value

But if the values are provided as raw floats, not Decimal, then they fail. Something to note,.

ybressler commented 4 months ago

Found where the behavior is being caused:

        primitive_schema = core_schema.union_schema(
            [
                # if it's an int keep it like that and pass it straight to Decimal
                # but if it's not make it a string
                # we don't use JSON -> float because parsing to any float will cause
                # loss of precision
                core_schema.int_schema(strict=True),   # <--------------------- HERE
                core_schema.str_schema(strict=True, strip_whitespace=True),
                core_schema.no_info_plain_validator_function(str),
            ],
        )
ybressler commented 4 months ago

Also, this test works in reverse, lossiness on json encoding too:

    m = Test(value=Decimal(1.234567890123456789012345678901234567890))
    print(m.model_dump_json())
    # expected output:  {"value":"1.234567890123456789012345678901234567890"}
    # actual output:    {"value":"1.2345678901234566904321354741114191710948944091796875"}
ybressler commented 4 months ago

Alright, got a solution going. Need some help with the deserialization component. https://github.com/pydantic/pydantic/pull/9291

ybressler commented 4 months ago

Alright, got a solution going. Need some help with the deserialization component. #9291

Welp! That was on a previous release of pydantic. Back to square one, working through it.

New PR: https://github.com/pydantic/pydantic/pull/9292

ChrisPappalardo commented 3 months ago

At PyCon 2024, looking at this now. Looks like PR #9292 stalled with comments, so I'll see if I can get it finished.