jcrist / msgspec

A fast serialization and validation library, with builtin support for JSON, MessagePack, YAML, and TOML
https://jcristharif.com/msgspec/
BSD 3-Clause "New" or "Revised" License
2.01k stars 59 forks source link

Incorrect decoding of 65-bit integer #701

Open jdmsu4 opened 3 weeks ago

jdmsu4 commented 3 weeks ago

Description

There seems to be a case where 65 bit integers are improperly decoded. I discovered this in a large dynamically generated json file at seemingly random times using the msgspec lib until I tracked down and isolated an example (below). Both larger and smaller encoded integers do not seem to have any issue. I expected the value of 19933688932870350000 to be consistently encoded and decoded as such, but instead received the number 1482881526185800828 as the output. As I was trying to figure out what was going on, I tried using the standard library json.loads() and got the expected value out. When examining the binary representation of the number, I noticed the incorrect value was equivalent to the correct one, besides the leading 4 bits:

Decimal Binary value Bits
Input 19933688932870350000 10001010010010100010000001110010000110100010010110011010001111100 65
Output (incorrect) 1482881526185800828 1010010010100010000001110010000110100010010110011010001111100 61

Here is a minimally reproducible example:

import json
import msgspec

# incorrect value
encoded_json = msgspec.json.encode({"test_value": 19933688932870350000})
decoded_json =  msgspec.json.decode(encoded_json)
print(decoded_json)

# correct value
comp_encoded_json = msgspec.json.encode({"test_value": 19933688932870350000})
comp_decoded_json = json.loads(comp_encoded_json)
print(comp_decoded_json)