Open hemidactylus opened 1 month ago
I do not follow: since DECIMAL
is specifically 10-based value it should have exact textual representation.
But even with 2-based (binary) floating-point I am not sure I get the point: yes, number representation as 10-based text may need to be rounded -- but would of JSON String make any difference? How would that help?
Ok, based on discussions this is concern wrt JSON handling differing between JSON Numbers and JSON Strings -- so if server-side was to use 2-based FPs (float
, double
) for all numbers, then use of JSON Strings would avoid the issue.
As thing are, Data API uses Java BigDecimal
for all JSON numbers with fractions so we should be good -- but if testing shows problems, we def need to address.
My 2 cents: I think returning decimals as numbers is the right way to go. There shouldn't be any loss of precision since JSON serialization isn't doing any arithmetic. And it seems semantically cleaner for numbers to end up as numbers in JSON.
However, serializing decimals as strings wouldn't be unprecedented. For example, MongoDB's JavaScript BSON serializer only supports Decimal128.fromString()
, there's no way to convert a JavaScript number (binary floating point) to a Decimal128 without going through a string first.
Reading through...
In pure CQL
Not sure how this exactly relates to the issue of encoding decimal, yes there is a reason for having floats and decimals.
I will note that the Decimal and Float sums above are the same, depending on the number of decimal places and this is why we have the decimal
In [5]: s = "-9.9362e-10"
In [6]: f = float(s)
In [8]: print(f"10 decimal places: {f:.10f}")
10 decimal places: -0.0000000010
In [12]: print(f"1 decimal places: {f:.1f}")
1 decimal places: -0.0
# using new rounding https://peps.python.org/pep-0682/
In [20]: print(f"1 decimal places: {f:z.1f}")
1 decimal places: 0.0
In [11]: float("-0.0") == float("0")
Out[11]: True
In [14]: float("-0.0") == int("0")
Out[14]: True
# false if we use the full exponent
In [15]: float(s) == int("0")
Out[15]: False
Data API
I am not following the example when it says "As a result, there is a lossy math when the original intent of the column type was precisely to avoid that." the values returned by the Data API are exactly the same as those returned by CQL.
For example, MongoDB's JavaScript BSON serializer only supports Decimal128.fromString(), there's no way to convert a JavaScript number (binary floating point) to a Decimal128 without going through a string first.
This is the mongo docs for this https://github.com/mongodb/specifications/blob/master/source/bson-decimal128/decimal128.md
and https://www.mongodb.com/docs/manual/reference/mongodb-extended-json/#mongodb-extended-json--v2-
not decisions just things to be considered....
As the JSON - Data Types page says....
Numbers in JSON are agnostic with regard to their representation within programming languages. While this allows for numbers of arbitrary precision to be serialized, it may lead to portability issues... There is no inherent precision loss in serializing a machine-level binary representation of a floating-point number (like binary64) into a human-readable decimal representation (like numbers in JSON) and back since there exist published algorithms to do this exactly and optimally.
My view on this is the JSON protocol says it has a way to encode an number that is not infinite or NaN. And then says it is up to the parsing code to alert the user if there is an issue with the decoding such as if the code cannot represent the number.
If a decimal is declared, they are always using an EJSON value { "$numberDecimal": "<number>" }
this is the same for their Canonical and Relaxed formats for their JSON formatting. The BSON formatting is "IEEE 754-2008 128-bit decimal floating point " , this is what they are sending over the wire and remember BSON as a little richer data types.
And if I read the spec correctly they are telling clients to not round trip into the native decimal types.
What happens to the classic telephone number problem, the string "012345" is a string of digits not a number.
We are building relaxed representation first, and the idea is to be as basic a JSON as we can. Any read will return return the schema, if a client wants to go beyond handling a number as JSOn number it can use the schema returned to know what is a decimal. The parsing in Python and Node.js and other libraries allows for special handling of fields and floats to handle these cases.
Related: filed #1654 wrt "other direction" for decimal
type.
The rounding issues that essentially led to creating the DECIMAL cql type would dictate such columns to be returned as strings, to let callers/users/clients to do the right thing. Otherwise, unexpected results may occur.
Principle
In binary floating-point
3.3 - 2.2 - 1.1 != 0
, because of rounding issues during the calculation. Decimal holds a special representation for numbers with decimals that avoids that.In pure CQL
This table has two columns (decimal, float) populated the same, which demonstrate this difference:
The
DECIMAL
column:The
FLOAT
column:Data API
When reading this table with the Data API (dockerized 1.0.18 to be precise), this is what I see:
As a result, there is a lossy math when the original intent of the column type was precisely to avoid that.