tc39 / source-map

Source map specification, RFCs and new proposals.
https://tc39.es/source-map/
Other
121 stars 16 forks source link

What to do about `B` and `-2147483648` in mappings #123

Closed nicolo-ribaudo closed 21 hours ago

nicolo-ribaudo commented 2 weeks ago

Given that until https://github.com/tc39/source-map/pull/119 we were very hand-wavy about how out base64 VLQs work, there is an ambiguity when it comes to decoding B (whose bit pattern is 00001).

My reading of our spec was that:

and thus B decodes to "-0", which is simply 0 because we are working with integers.

@jridgewell's reading is that (@jridgewell is this correct? I tried applying this same "algorithm" to F and it does not work):

and this B decodes to 1000_0000_0000_0000_0000_0000_0000_0000, which when interpreted as an i32 is -2147483648


I did some testing in various implementations:

Test 1

0;throw new Error("This is an error");
// { "version": 3, "sources": ["original.js"], "mappings": "AA0gggggCA,CABA,CA0gggggCA" }
//# sourceMappingURL=data:application/json;base64,eyAidmVyc2lvbiI6IDMsICJzb3VyY2VzIjogWyJvcmlnaW5hbC5qcyJdLCAibWFwcGluZ3MiOiAiQUEwZ2dnZ2dDQSxDQUJBLENBMGdnZ2dnQ0EiIH0=

0gggggC is 2^30+10, so in implementations that decode B as -2^31 the error will be reported as being thrown at line 2^30+10 - 2^31 + 2^30+10 = 20 (or 21 if 1-based)

In implementations that decode B as 0 this overflows, so the result is unspecified and test does not apply.

Test 2

0;throw new Error("This is an error");
// { "version": 3, "sources": ["original.js"], "mappings": "AACA,EABA" }
//# sourceMappingURL=data:application/json;base64,eyAidmVyc2lvbiI6IDMsICJzb3VyY2VzIjogWyJvcmlnaW5hbC5qcyJdLCAibWFwcGluZ3MiOiAiQUFDQSxFQUJBIiB9

C is 1, so in implementations that decode B as 0 the error will be reported as being thrown at line 1 + 0 = 1 (or 2 1-based)

In implementations that decode B as -2^31 this yields a negative number (-2147483647), so the test does not apply.

These are the results:

Test1 Test2 Conclusion: B is decoded as
Node.js using --enable-source-maps 21 :heavy_check_mark: -2147483646 :x: -2^31
Node.js using npm:source-map-support -2147483627 :x: 2 :heavy_check_mark: 0
Deno 2147483669 :x: 2 :heavy_check_mark: 0
Chrome -2147483627 :x: 2 :heavy_check_mark: 0
Firefox -2147483627 :x: 2 :heavy_check_mark: 0
Safari -2147483627 :x: 2 :heavy_check_mark: 0
@jridgewell/sourcemap-codec 20 :heavy_check_mark: -2147483647 :x: -2^31
Bun I also tried Bun using [this trick](https://x.com/jarredsumner/status/1828369775472636289). You need to prefix the file with `@bun` line, prefix the `"mappings"` string with `;`, and re-encode the source map to Base 64. However, in both test case it errors with "Could not decode sourcemap".

My proposal

We should clarify that B is decoded to 0 and not to -2^31, ratifying what Deno/Chrome/Firefox/Safari do.

I propose that we go even one step further, and say that encoders must encode 0 as A and not as B. This carves out some space in the mappings field in case we'll ever need it in the future: we can one day give a special meaning to ,B,, and tools that do not recognize this meaning would simply fallback to considering it as a no-op :)

jridgewell commented 2 weeks ago

(@jridgewell is this correct? I tried applying this same "algorithm" to F and it does not work):

Decoding is if (negative) return 0x8000_0000|-v. The -v ensures the bit pattern is 2s-complement for the OR, and the 0x8000_0000 ensure the sign bit is always set (it only matters for -0, who's 2s-complement is the same as regular 0).

These are the results:

There are also bugs with round-tripping values >= to 1,073,741,824.

Engine 1073741823 1073741824 2147483648
Chrome ✅ 1073741824 ❌ -1073741823 (>> used) ✅ 1
Safari ✅ 1073741824 ❌ -1073741823 (>> used) ✅ 1
Firefox ✅ 1073741824 ✅ 1073741825 (i64 used internally) ❌ -2147483647 (and it takes like 10 minutes!)
Node native ✅ 1073741824 ✅ 1073741825 (based on my old Chrome code) ✅ 1
Node source-map-support ✅ 1073741824 ❌ -1073741823 (>> used) ✅ 1
Deno ✅ 1073741824 ✅ 1073741825 (I can't find their impl) ❌ 2147483649
Bun ✅ 1073741824 ✅ 1073741825 (u32 internally) 🟨 0 (supposed to be 1-based)
source-map-codec ✅ 1073741823 ✅ 1073741824 (>>> used) ✅ 0

Values this large have the 31st bit set, and when we encode we perform (v << 1), shifting that 31st bit to the 32nd bit (the sign bit of the result!). If they don't use >>> internally, then the sign bit will not shift downwards when decoding, and the result will remain negative.

Rather than standardize broken behavior, I think we should fix the impls, including the proper decoding of B.

jridgewell commented 2 weeks ago

To help with this discussion, I built https://justin.ridgewell.name/integer-toy/