w3c / csswg-drafts

CSS Working Group Editor Drafts
https://drafts.csswg.org/
Other
4.46k stars 657 forks source link

[cssom] Serialize numbers using scientific notation? #8538

Open andruud opened 1 year ago

andruud commented 1 year ago

Created from https://github.com/w3c/csswg-drafts/commit/1796eb44e47efde910e94b7704c6d85e9cda0781#r102754719.

The spec currently says that scientific notation is not used, but relevant people seem to think that we should in fact use that notation now (see link).

If so, we need to specify when sci-not is and isn't used, and how.

@zcorpan @tabatkins @emilio

tabatkins commented 1 year ago

I think we should just do what JS does.

Edit: If I'm understanding the algo correctly, the n is the number of integer digits the number contains. (If the number's magnitude is less than 1, it's negative, giving the number of zeros between the decimal point and the first non-zero digit.)

So JS prints without scinot if the number either has 21 or less digits of integer part, or has less than 6 leading zeros in the decimal part. If it's larger or smaller than that, it uses scinot. Testing in the console confirms this.

JS produces large string representations by default, while we intentionally capped the decimal precision of our string representations, but the thresholds match.

So I suggest we use scinot when either:

In either of these conditions, we format the number with a single non-zero digit in the integer part and up to 6 digits in the decimal part (less than 6 if they are trailing zeros), followed by the exponent part. (Omitting the decimal part entirely if it's all zeros, obvs.)

zcorpan commented 1 year ago

While at it, we should serialize Infinity and NaN: https://drafts.csswg.org/css-values/#calc-error-constants

The precision loss to 6 digits helps with hiding implementation details of the precision, but OTOH it might cause roundtrip degradation for e.g. pi and e: https://drafts.csswg.org/css-values/#calc-constants - should those values be serialized as keywords?

Since the keywords are only allowed in calc(), that would need to be supported in serialization also. From what I understand of css-values, <number> does not include calc(). I don't see "calc" mentioned in cssom. (Maybe this should be its own issue?)

emilio commented 1 year ago

calc() serialization and simplification is specified in css-values, afaict.

tabatkins commented 1 year ago

The <number> production includes anything that's a number, including all math functions whose type is <number>

tabatkins commented 1 year ago

For the Agenda+, here's a first draft of the text to replace the <number> part of the serialization rules:

The return value of the following algorithm:

<div algorithm="serialize a number">
    1. Let |s| initially be the empty [=string=].

    2. If the absolute value of the component is less than 10<sup>21</sup> and greater than or equal to 10<sup>-6</sup>, 
            or equal to zero:

        * Serialize the integer part of the component as a base-10 number
            (omitting leading zeros)
            and append the result to |s|.

        * If the decimal part of the component,
            when truncated to 6 digits,
            is non-zero,
            append "." (U+002E FULL STOP) to |s|,
            then serialize the decimal part of the component as a base-10 decimal,
            truncating to 6 digits
            and omitting trailing zeros,
            and append the result to |s|.

        * Return |s|.

    3. Otherwise, serialize the result in scientific notation:

        * Let |power| be the integer power of 10 that,
            when multiplied with the component,
            produces a number with a single non-zero integer digit.

            Let |shifted component| be the result of multiplying the component by |power|.
        * Serialize |shifted component| as a <number>,
            and append the result to |s|.
        * Append "e" (U+0065 LATIN SMALL LETTER E) to |s|,
            then serialize |power| as a base-10 integer,
            omitting leading zeros,
            and append the result to |s|.
        * Return |s|.
</div>

Note: This algorithm matches the behavior of JavaScript in serializing numbers,
except that we additionally truncate the decimal portion to a maximum of 6 digits.
This somewhat avoids exposing the exact representation precision of numeric values,
as that can change between properties and between implementations.
It also avoids exposing minor differences in ordering of internal arithmetic operations,
which might produce very slightly different floating point values
which would serialize differently
despite acting identically in practice.
cdoublev commented 1 year ago

I am not sure that the plan is to check later if this change can be applied to <integer>, but I am referencing #6471 just in case.

dbaron commented 1 year ago

For what it's worth, one principle of serialization that I think we documented somewhere (but I can't find it anywhere that's general, although it's documented in a more specific case for serializing CSS values) is that serializing should generally prefer serializing to the more backwards-compatible / older form when there are different serialization possibilities that have been part of CSS for different amounts of time. This preference is both (a) because older web content might expect that and (b) because serialized content might be sent to a different user agent with different capabilities. (Though (b) is probably less of an issue these days because of faster browser release cycles and faster uptake of those releases.)

Following this principle here would mean not using scientific notation, or at least mean limiting its use to cases where non-use is problematic (for example, because it breaks other serialization principles such as lack of dataloss during a round-trip through parsing and serialization... though I'm not sure that's the case here).

That said, if we agree that (b) is less important these days, then I think this principle just degrades to the caution that we have whenever making non-backwards-compatible changes.

css-meeting-bot commented 1 year ago

The CSS Working Group just discussed [cssom] Serialize numbers using scientific notation?, and agreed to the following:

The full IRC log of that discussion <fantasai> TabAtkins: our rules for serializing numbers are reasonably well-defined
<fantasai> TabAtkins: But we have scientific notation now, which is widely implemented
<fantasai> TabAtkins: Browsers use it for serialization *sometimes*, inconsistent, depends on the property...
<fantasai> TabAtkins: So the proposal here is to formalize when we serialize as scinot
<fantasai> TabAtkins: My proposal is to match JS exactly, which means that you use scinot whenever either the number has 22 or more digits of integer value, or has 6 or more leading zeroes in its decimal portion and is zero integer
<fantasai> TabAtkins: only change from JS is that we continue to truncate to only 6 digits after decimal point, maximum
<fantasai> TabAtkins: this is required for compat
<fantasai> TabAtkins: and also it hides some differences between browsers/properties
<fantasai> TabAtkins: and it also hides some floating-point variances
<fantasai> TabAtkins: so there's spec text in the issue
<fantasai> TabAtkins: and that's it
<bkardell_> do we have numbers that big?
<fantasai> TabAtkins: Wrt interop, we're all over the place
<fantasai> TabAtkins: every property and every browser does an effectively random thing
<fantasai> TabAtkins: partially due to different levels of precision, e.g. width supports subpixel, but scale property supports ...
<fantasai> TabAtkins: e.g. Chrome start scinot at 0.0001
<fantasai> TabAtkins: So no interop, so match JS with wrinkle about 6 digits seems reasonable to do with minimal impact on authors
<fantasai> Rossen_: Any additional comments?
<fantasai> TabAtkins: dbaron's point was to bias towards older formats during serialization
<fantasai> ... that's part of why the bounds are so wide
<fantasai> ... Most numbers you will ever encounter in a stylesheet do not trigger scinot
<fantasai> TabAtkins: but outside those bounds, e.g. when serializing a transform matrix, need to serialize somehow
<fantasai> Rossen_: Pretty clear proposal, not seeing anyone rushing to the queue ...
<fantasai> ... any objections to the proposal?
<fantasai> RESOLVED: Accept proposal to match JS scinot serialization triggers, other than 6-digit decimal truncation rule
bernhardf-ro commented 4 months ago

While it is good to see that small numbers will be serialized with great precision (7 digits) there is a noticeable 'jump' in precision between "e-6" and "e-7". For example 1.987654321e-7 or 0.0000001987654321 would be serialized as 1.987654e-7 (aka 0.0000001987654), but 1.987654321e-6 or 0.000001987654321 as 0.000001 (aka 1e-6), dropping the precision from 7 digits to 1.

Allowing "6 significant digits" instead of "truncating to 6 digits", for numbers between 0 and 1, would eliminate that 'jump'. Besides making the whole behavior more consistent, and understandable for authors, it would make working with very small (but not excessively small, i.e. e-1 to e-6) numbers a lot safer. This is the current behavior of Firefox and PDFreactor (except for scinot) and we cannot easily change that in PDFreactor, as we had use-cases where a strict 6 decimals limit would interfere with the results.

(A similar 'jump' happens when large numbers switch to scinot and limiting significant digits would also solve that. Again this is already the behavior of Firefox and PDFreactor. However this isn't really a significant issue.)

tabatkins commented 4 months ago

Just to redocument the behavior between browsers currently, here's a testcase:

code ```html
JS ValueSpecifiedComputed
```

The results are different between all three major engines:

  1. Blink: retains six significant figures (aka .000123457), switches to scinot at e-5.
  2. Gecko: retains six significant figures, switches to scinot at e-7.
  3. WebKit: retains six figures after the decimal point (aka .000123), never switched to scinot. (At e-7 it just prints as 0)
tabatkins commented 4 months ago

Conclusion: yeah, we should indeed mandate six significant figures (matching Firefox, and mostly matching Blink), rather than six figures (kinda matching WebKit, but it never actually switches to scinot so it doesn't count).