tc39 / proposal-decimal

Built-in exact decimal numbers for JavaScript
http://tc39.es/proposal-decimal/
497 stars 18 forks source link

Thoughts on Precision in Decimal #175

Open sffc opened 2 months ago

sffc commented 2 months ago

A topic that continues to be raised for discussion in the context of the Decimal proposal is the concept of precision, its interaction with arithmetic, and whether it should be part of the data model at all. I've been meaning to make a write-up for some time, and since we had a discussion on this subject in TG3 today, I thought this would be a good opportunity.

First, I want to start with the fundamental axiom of my position and all arguments that follow. The axiom is that the existence of trailing zeros makes certain values distinct from others. For example, "1" is distinct from "1.0", and "2.5" is distinct from "2.50". I will present evidence for this axiom:

  1. The two values are written and spoken differently.
  2. In language, the values influence the words around them. My favorite example: in English, one would say "I see one star in the sky" but "the average rating for this restaurant is one point zero stars".
  3. There is a large amount of precedent in software libraries that these values are distinct. Citation: https://github.com/tc39/proposal-decimal/issues/89#issue-2011833029

These are all evidence that "1", a natural number, and "1.0", a decimal approximation of something, are distinct values.

I consider this axiom to be a matter of fact, not a matter of opinion. To make an argument to the contrary would be to say that "1" and "1.0" are always fully interchangeable, CLDR is wrong to pluralize "star" in the string "1.0 stars", and software libraries are wrong to represent this in their data models.

Okay, now that I've established the fundamental axiom, I will make a case that core ECMAScript should have a way of representing these values as distinct.

  1. The distinct values influence behavior across multiple different localization operations. The formatting of numbers and the selection of plural rules, for example, both have different behavior depending on whether the number has trailing zeros. The upcoming Intl.MessageFormat may also feed the same value into multiple operations.
  2. That rounding, fractional digits, and trailing zeros impact plural rule selection is among the top preventable bugs I see in my experience in the i18n space, and the Decimal proposal presents an opportunity to make these bugs harder to write. I present examples and evidence in my 2020 post here: https://github.com/tc39/ecma402/issues/397#issuecomment-602863604. Also see my example down below.
  3. There is extensive precedent of decimal libraries in other languages representing precision of the decimals (citation above, https://github.com/tc39/proposal-decimal/issues/89#issue-2011833029). Failing to represent precision in the Decimal data model would break from this long precedent.
  4. Given the extensive precedent, failure to represent precision in the data model makes us unable to round-trip values coming from other decimal implementations, harming our ability to interoperate with them.
  5. IEEE Decimal128 itself contains the concept of precision. By omitting the precision from the data model, ECMAScript is not following its stated reference specification, and it is in effect inventing its own representation of numbers. In the same way that Temporal should not invent concepts related to date representation (such as the IXDTF string syntax), Decimal should not invent concepts related to numeric data models.
  6. Representing the precision of a number has applications in scientific computing and accounting. Given that these use cases are amongst those that motivate Decimal in the first place, we should consider variable precision to be fundamental to the value proposition of the proposal.

Points 1 and 2 (that the problem exists and that the problem causes real bugs in the wild) are the ones of primary concern to me as an i18n advocate in TC39. Points 3-6 are additional ones I offer.

I will now address three counter-arguments that were raised in the TG3 call today.

@ljharb pointed out that a person often first comes across the concept of precision in numbers in a physics course and that it is often a hard concept to grasp. This is a true statement, and it could perhaps be used as evidence that it is confusing for decimal arithmetic to propagate precision. However, it is not an argument that the concept doesn't exist or whether the concept has applications relevant to the Decimal proposal.

@erights pointed out that the numbers π and ⅓ are distinct numerical concepts also not representable by a Decimal. This is again a true statement. However, I do not see how it leads logically to an argument that 1.0 and 2.50 should be excluded from Decimal. That 1.0 and 2.50 have applications to the Decimal proposal is not changed by the existence of π and ⅓.

Someone else, I think @nicolo-ribaudo, pointed out that representing the precision of decimals is a concern about how to display the mathematical value, i.e., a formatting option, not a concern for the data model. This is a valid position to hold and one I've often seen. My counter-argument is that internationalization's job is to take distinct values and display them for human consumption. Intl's job is to decide what numbering system to use, what symbols to use for decimal and grouping separators, whether to display grouping separators, and where to render plus and minus signs, for example. It is not Intl's job to decide whether to display trailing zeros, since making that decision changes the input from one distinct value to a different distinct value. Relatedly, it is also not generally Intl's job to decide the magnitude to which it should round numbers.

I will close with one more thought. Decimal128 representing trailing zeros does not by itself prevent the i18n bugs noted above. However, it sets us on a path where the cleanest, simplest code is the code that produces the correct i18n behavior. For example, in 2024, code that correctly calculates and renders a restaurant rating would look like this:

function formatRestaurantRatingInEnglish(ratings) {
  let avg = ratings.reduce((sum, x) => sum + x, 0) / ratings.length;
  let formatOptions = { minimumFractionDigits: 1, maximumFractionDigits: 1 };
  let pluralRules = new Intl.PluralRules("en", formatOptions);
  let formattedNumber = avg.toLocaleString("en", formatOptions);
  if (pluralRules.select(avg) === "one") {
    return `${formattedNumber} star`;
  } else {
    return `${formattedNumber} stars`;
  }
}

Note that an identical formatOptions must be passed as an argument to both new Intl.PluralRules and Number.prototype.toLocaleString (or equivalently new Intl.NumberFormat); if it is not, you have a bug. However, with a Decimal that represents trailing zeros, the code can be written like this:

function formatRestaurantRatingInEnglish(ratings) {
  let avg = ratings.reduce((sum, x) => sum.add(x), Decimal128.zero())
    .divide(ratings.length)
    .roundToMagnitude(-1);
  let pluralRules = new Intl.PluralRules("en");
  let formattedNumber = avg.toLocaleString("en");
  if (pluralRules.select(avg) === "one") {
    return `${formattedNumber} star`;
  } else {
    return `${formattedNumber} stars`;
  }
}

My unwavering principle in API design is that the easiest code to write should be the code that produces the correct results. We cannot prevent people from doing the wrong thing in a Turing-complete language, but it is our core responsibility as library designers to nudge developers in the right direction.

Also CC: @jessealama @ctcpip @littledan

ljharb commented 2 months ago

My position is that your axiom is false in a general sense, even though it is objectively true in specific contexts.

In other words, it's not that 1 and 1.0 are always interchangeable, it's that 1 and 1.0 aren't always distinct - and I'd personally say they're most often not distinct. While the decision to say "1 point zero stars" definitely implies a distinction, that's not a number, it's additional context around a number.

The concept of precision definitely exists! However, it's not inextricably attached to a number system, especially not a universally human-used number system, and since it can't be included in Decimal without downsides (in particular, preventing it from ever becoming a primitive), if it needs to be represented, it should be done as a separate object and not as a Decimal.

sffc commented 2 months ago

My position is that your axiom is false in a general sense, even though it is objectively true in specific contexts.

In other words, it's not that 1 and 1.0 are always interchangeable, it's that 1 and 1.0 aren't always distinct - and I'd personally say they're most often not distinct. While the decision to say "1 point zero stars" definitely implies a distinction, that's not a number, it's additional context around a number.

So, the fact that we are talking about the differences between the two entities means that they are distinct entities. In some contexts, they are interchangeable; you claim they are interchangeable in many/most contexts. But, they are objectively still distinct entities, even if interchangeable. The axiom is true in the general sense.

Note that I switched the term "value" for "entity". I considered the word "value" to mean any distinct entity, with no connotations about being associated with a spot on the number line. It is not my intent to cause confusion with terminology.

sffc commented 2 months ago

I'll make one additional observation that I could have included in the OP. I will use @ljharb's language around "interchangeability", which I think is an appropriate term that can help us have productive conversations.

The README lists three use cases for this proposal:

  1. Primary use case: Representing human-readable decimal values such as money
  2. Secondary use case: Data exchange
  3. Tertiary use case: Numerical calculations on more precise floats

Use case 1 specifically calls out "human-readable decimal values". In that context, 1 and 1.0 are definitely not interchangeable, due to their impact on spoken and written language.

Use case 2 calls out "data exchange". In that context, 1 and 1.0 are definitely not interchangeable for data exchange: we won't be able to round-trip values coming from other decimal libraries that make the distinction, as most do (see OP for citation).

Use case 3 calls out "numerical calculations". That is exactly the scientific computing context where precision matters and 1 and 1.0 are not interchangeable.

Therefore, all 3 motivating use cases for this proposal are use cases where the two entities are not interchangeable.

nicolo-ribaudo commented 2 months ago
  1. The distinct values influence behavior across multiple different localization operations. The formatting of numbers and the selection of plural rules, for example, both have different behavior depending on whether the number has trailing zeros. The upcoming Intl.MessageFormat may also feed the same value into multiple operations.

  2. That rounding, fractional digits, and trailing zeros impact plural rule selection is among the top preventable bugs I see in my experience in the i18n space, and the Decimal proposal presents an opportunity to make these bugs harder to write. I present examples and evidence in my 2020 post here:

I very strongly sympathize with this. I myself had problems caused by NumberFormat and PluralRules getting out of sync, because whether I wanted multiple digits or not needs to be specified twice as options to the constructors rather than together with the argument I give them to format.

  1. Representing the precision of a number has applications in scientific computing and accounting. Given that these use cases are amongst those that motivate Decimal in the first place, we should consider variable precision to be fundamental to the value proposition of the proposal.

When it comes to accounting, you always want the maximum possible precision: you want to know exactly how much money is flowing, and saying "roughly $1" is not enough. In that context, 1 and 1.0 have the exact same meaning: in both case it's 1 dollar and zero cents, and not "between 0.5 and 1.5" and "between 0.95 and 1.05". When you multiply $1.00 by 50% (0.5), Decimal128 gives you 0.500 but that's still not representing something different from $0.50. What matters its that the magnitude of the number is exact, and not how the error propagates since there is no error.

In sciences the error is relevant, but regardless of whether this proposal preserves trailing zeroes or not you'll have to manually propagate the errors by yourself. If you have two sticks whose length is 0.500cm (0.5 ± 0.0005) and 0.200cm (0.2 ± 0.0005), the length of putting the two sticks together is not what you would represent as the decimal 0.7000cm (0.7 ± 0.0005), but it's 0.7 ± 0.001 (assuming that the two measurements are independent, otherwise we would also have to factor in the correlation between them that is not even represented in either of the two operands).

Someone else, I think @nicolo-ribaudo, pointed out that representing the precision of decimals is a concern about how to display the mathematical value, i.e., a formatting option, not a concern for the data model. This is a valid position to hold and one I've often seen. My counter-argument is that internationalization's job is to take distinct values and display them for human consumption. Intl's job is to decide what numbering system to use, what symbols to use for decimal and grouping separators, whether to display grouping separators, and where to render plus and minus signs, for example. It is not Intl's job to decide whether to display trailing zeros, since making that decision changes the input from one distinct value to a different distinct value. Relatedly, it is also not generally Intl's job to decide the magnitude to which it should round numbers.

This is probably the NumberWithPrecision I suggested, however let me characterize my position better.

Precision can be relevant in multiple places and not just at the end when displaying a value on screen, even if that will probably be 95% of the use cases on the web. However, the way that this proposal handles precision is:

In both of those cases, you are going to use decimal values as if they were infinite-precision numbers, and then once you are done computing define what precision the result has before passing the value to the next "system". This is exactly what you are doing in the stars example:

and you are manually defining the precision at the boundary between these two parts.

My NumberWithPrecision suggestion was to allow manually decorating numbers with a precision at those communication points, while keeping Decimal as a higher-precision-than-float behaves-like-humans-think pure number type. I've prepared an example: https://gist.github.com/nicolo-ribaudo/1ae2f261f2513c45f4bd3d7ede06c42f

If it was only for the scientific computation use case this NumberWithPrecision could be implemented entirely in userland, but the integration with Intl gives it a reason to be in the language.


(additional thoughts after re-reading my comment)

I believe my comment addresses points (1) and (3) from the readme, but I also want to address (2).

While its true that a number that looses trailing zeroes cannot use an an itermediate step for round-tripping to a number that cares about trailing zeroes, when does this matter?

jessealama commented 2 months ago

The fact that we can get the right result out of a proper use of Intl.NumberFormat and Intl.PluralRules makes me wonder if we're looking at something that involves an inherent challenge in i18n, something that experienced i18n developers need to know (among many other complexities in i18n). Or, put differently, I wonder whether decimal numbers have any interesting advantage in this kind of use case.

Looking more closely at the restaurant rating example above, in which non-normalized decimals are used, I wonder about whether we're asking PluralRules' select to make assumptions that may not be generally true. In the code example above, we need to pass in formatOptions twice to get things right with the restaurant rating. In my view, that's not that big of a deal. But is that avoidable? Would decimals make all such interactions of NF and PluralRules more concise and easier to get right? Imagine a case where one needs to pass different options to NumberFormat and PluralRules. For instance, if one were tasked with generating text about, say, financial quantities, we may well need to handle 1 and 1.0 identically. For instance, even if you had 1.0, we might not not want to consider it plural in such cases. In such case, we would need to normalize our decimal first before passing it in to PluralRules' select. And yet we may want to render that 1.0 with an additional trailing digit, serializing it as "1.00".

(This kind of example opens the door to a topic. We have generally been talking about taking numbers with lots of fractional digits, possibly including trailing zeroes, and rounding. This might be OK for various understandings of "precision", but what about the practice of adding trailing zeroes? In some contexts, that's not OK. If I have 2.5, I don't necessarily have 2.50, too.)

This kind of example illustrates that every use case is different. NF and PluralRules can interact in subtle ways, an even well-intentioned programmers might write buggy code, but arguably, that's just how the (i18n) world is. It's not a design flaw in Intl. We may need to trim trailing zeroes in some cases, preserve them others, or even pad the digit string with extra zeroes (imputing data). Or, as in the case above, doing a mixture of the above.

I like @nicolo-ribaudo 's idea of a number-plus-precision value. To add to that, I might propose an extension of PluralRules' select that would allow digit strings (that is, strings with a certain syntax) as an argument, not just Numbers.

jessealama commented 2 months ago

Reflecting on the data exchange use case, one thing that troubles me is that the idea of preserving trailing zeroes loses a bit of its appeal when we consider that a decimal number coming over the wire, possibly with trailing zeroes, might be an input to a calculation in which the number of fractional digits can be very different from those of the original number. Especially with multiplication and division, the number of fractional digits grows rapidly.

Putting this another way, the intuitively appealing conservativeness in the data exchange use case ("Don't delete any information given to you") works when a JS engine sitting in the middle receives a decimal and passes it along unchanged to the next system. But that's a fairly trivial use case. I wonder if a different, related conservation principle might be better, along the lines of "Preserve the number as accurately as possible". This may or may not involve preserving trailing zeroes.

jessealama commented 2 months ago

Taking a look at the spec text for Intl.PluralRules.select, I see that my suggestion of making .select accept a digit string, doesn't work. I mistakenly drew the conclusion, based on the MDN docs, that .select just takes a Number.

The idea, proposed by @nicolo-ribaudo and others in the TG3 call last week, of extending .select & friends to accept something like a number-with-precision as an argument, still stands.

sffc commented 2 months ago

About number-with-precision: I'd need to see a specific concrete design, but my initial reaction is that I don't really see a big difference between the use cases for number-with-precision and for Decimal. Going back to the 3 motivational use cases outlined in the README:

  1. Representing human-readable decimal values such as money: number-with-precision does this.
  2. Data exchange: it seems number-with-precision could be designed to do this, too.
  3. Numerical calculations on more precise floats: a real Decimal128 is probably better for this. However, this is the "tertiary" use case, and the README acknowledges it as not being the most strongly motivated for the standard library.
ljharb commented 2 months ago

The values of money don't retain extra zeroes - only the formatting/display of them do. $1 and $1.0 and $1.00 are the exact same amount of money.

sffc commented 2 months ago

$1 and $1.0 and $1.00 are the exact same amount of money.

Yes? They are numerically equal entities. As I said in the OP, I don't see how these truisms lead logically to one proposal or another. Being able to represent these entities uniquely in the data model in no way contradicts the fact that they represent the same point on the number line.

ljharb commented 2 months ago

I agree with that - but "the number line" is the only thing that a number system represents. Precision is something extra - not something that belongs directly in the primitive (whether JS primitive, or primitive number concept)

nicolo-ribaudo commented 2 months ago

Hey I don't think that you two keeping telling the other "they are the same number!" "no they represent something different!" is very productive. We all agree that on the number line 1 and 1.0 are the same number, but that in some contexts humans give to 1 and 1.0 different meanings.

A new primitive/object can represent either some points on the number line or it can contain more info, and there is nothing theoretically preventing either solution other than deciding which tradeoffs we are willing to make.

The main drawback of keeping precision in the number is that it closes the door to introducing decimal primitives in the future. On the other hand, the motivation for keeping the precision is that:

The main precision related-feature that the current proposal has and that we don't need/want is how it propagates through arithmetic operations, because in practice developers who care about precision still need to manually define it correctly after computing something.


I wrote down an updated version of the "numeric value with precision idea": https://gist.github.com/nicolo-ribaudo/27c6156cefe27cf488f028e0236dc667

I'd love to hear how y'all feel about it. Some examples for how to use it:

formatRestaurantRatingInEnglish (assuming no decimal primitives) ```js function formatRestaurantRatingInEnglish(ratings) { let avg = ratings.reduce((sum, x) => sum.add(x), new Decimal("0")) .divide(ratings.length) .withPrecision(2); let pluralRules = new Intl.PluralRules("en"); let formattedNumber = avg.toLocaleString("en"); if (pluralRules.select(avg) === "one") { return `${formattedNumber} star`; } else { return `${formattedNumber} stars`; } } ```
Parsing with precision Note: This could be a built-in function, such as the `Decimal.parseWithPrecision` mentioned at the bottom of that gist ```js function parseWithPrecision(str) { return new Decimal(str).withPrecision(str.split("e", 1)[0].replace(".", "").length); } ```
Reading a decimal from a file, multiply it by 100, and re-stringifying it with the same number of significant digits ```js const decStr = fs.readFileSync("./my-decimal.txt", "utf8"); const dec = parseWithPrecision(dec); const scaled = dec.numericObject.multiply(new Decimal("100")).withPrecision(dec.significantDigits); fs.writeFileSync("./my-decimal.txt", scaled.toString()); ``` Note: If instead `.withPrecision()` works on the number of _fractional digits_ rather than _significant digits_, the example becomes the following. ```js const decStr = fs.readFileSync("./my-decimal.txt", "utf8"); const dec = parseWithPrecision(dec); const scaled = dec.numericObject.multiply(new Decimal("100")).withPrecision(dec.fractionalDigits - 2); fs.writeFileSync("./my-decimal.txt", scaled.toString()); ```
Adding up a list of numbers, and then giving the result the highest precision that still respects error propagation as per Wikipedia: Propagation of uncertainty Note: This example assumes that `.withPrecision()` works on the number of _fractional digits_ rather than _significant digits_. If it goes with significant digits then it would need to convert back and forth between them. ```js function sumWithPrecision(nums) { const sum = new Decimal("0"); const error = new Decimal("0"); for (const { numericObject, fractionalDigits } of nums) { sum = sum.add(numericObject); error = error.add(new Decimal("0.1").scale(-fractionalDigits)); } return sum.withPrecision( /* fractionalDigits */ Math.ceil(Math.log10(Number(error)) ); } ```
Adding up a list of numbers, and then giving the result the highest precision that still respects error propagation as per IEEE-754 error precision semantics Note: This example assumes that `.withPrecision()` works on the number of _fractional digits_ rather than _significant digits_. If it goes with significant digits then it would need to convert back and forth between them. ```js function sumWithPrecision(nums) { const sum = new Decimal("0"); const precision = Infinity; for (const { numericObject, fractionalDigits } of nums) { sum = sum.add(numericObject); precision = Math.min(precision, fractionalDigits); } return sum.withPrecision( /* fractionalDigits */ precision ); } ```

Re the points in https://github.com/tc39/proposal-decimal/issues/175#issuecomment-2345113378: I don't see how this number-with-precision makes it any easier to represent human-readable quantities and to data exchange than what Number.prototype.toPrecision() and Number.prototype.toFixed() already do.

jessealama commented 2 months ago

I think we can agree that decimals, with or without trailing zeroes, address the three (classes of) use cases we have in mind.

As for the tertiary use case, I think the main way to address these is (1) to make decimals fast and (2) to offer mathematical operations that routinely come up in scientific computation, such as the trigonometric functions, logarithm and exponential, and sqrt, and perhaps more. For these operations, decimals with or without trailing zeroes are valid. For most arguments to these functions, we'd need all 34 significant decimal digits to express the result. It's unlikely (though possible) that the last stretch of decimal digits would be 0s. Similar to many of the operations currently found in Math, it would have to be understood by the users of these operations that the results are (almost) never exactly correct; any result would have to be understood as an approximation, a representative in a (very small) interval which contains the exact result.

jessealama commented 2 months ago

Re the points in #175 (comment): I don't see how this number-with-precision makes it any easier to represent human-readable quantities and to data exchange than what Number.prototype.toPrecision() and Number.prototype.toFixed() already do.

Just for the record: the decimal proposal does have toPrecision and toFixed methods for getting a string version of a decimal that can "update" the coefficient (sequence of significant digits) (in the case of toPrecision) and number of fractional digits (in the case of toFixed). These methods don't literally update a decimal because we envision decimals as non-updatable values. So if you have, say 1, you can sort of impute greater precision by calling e.g. toFixed(2) on that decimal value.

TehShrike commented 2 months ago

When it comes to accounting, you always want the maximum possible precision: you want to know exactly how much money is flowing, and saying "roughly $1" is not enough. In that context, 1 and 1.0 have the exact same meaning: in both case it's 1 dollar and zero cents

The values of money don't retain extra zeroes - only the formatting/display of them do. $1 and $1.0 and $1.00 are the exact same amount of money.

Having worked on several invoicing/point-of-sale systems, this is very incorrect.

When your tax for an item comes out to $1.1100, those digits are significant. The decision of when it is okay to round/trim your tax numbers to add them to your subtotal is a business decision that your accountant will probably have opinions about.

Maybe your system calculates the tax per-line item and rounds the taxable amount per-lineitem. Maybe you sum up all of the lineitems' tax amounts with the full 4 digits of precision, and then round them.

Whatever choice you and your accountants make, you want to be very explicit about what precision you are dealing with during each step, and when you are choosing to round the numbers and decrease the precision.

If BigDecimal does not support precision, people will need to continue to use userland libraries like financial-number to stay on top of precision.

nicolo-ribaudo commented 2 months ago

Whatever choice you and your accountants make, you want to be very explicit about what precision you are dealing with during each step, and when you are choosing to round the numbers and decrease the precision.

The proposal as it is right now implicitly propagates precision, according to the IEEE 754 rules.

If the proposal doesn't propagate that implicitly anymore, it's on the develope to round the number at the steps where they want that to happen: for example, after adding tax to each row or the invoice, or at the end after adding tax on the total.

For example,

let total = rows
  .map(x => x.multiply(tax).round(2))
  .reduce((a, b) => a.add(b));

vs

let total = rows
  .map(x => x.multiply(tax))
  .reduce((a, b) => a.add(b))
  .round(2);
nicolo-ribaudo commented 2 months ago

The correct example from your readme would become

const subtotal = new Decimal('1.5').multiply(new Decimal('24.99'))
const rounded_subtotal = subtotal.round(2)
rounded_subtotal.toString() // => '37.48'

const tax = rounded_subtotal.multiply(new Decimal('0.14'))
const rounded_tax = tax.round(2)
rounded_tax.toString() // => '5.24'

const total = rounded_subtotal.add(rounded_tax)
total.toString() // => '42.72'

For comparison, with the library it's very similar (just different method names):

const subtotal = number('1.5').times('24.99')
const rounded_subtotal = subtotal.changePrecision(2)
rounded_subtotal.toString() // => '37.48'

const tax = rounded_subtotal.times('0.14')
const rounded_tax = tax.changePrecision(2)
rounded_tax.toString() // => '5.24'

const total = rounded_subtotal.plus(rounded_tax)
total.toString() // => '42.72'