ljharb / es-abstract

ECMAScript spec abstract operations.
MIT License
114 stars 30 forks source link

`UTF16Decode`, `CodePointAt`, and related operations return strings instead of numbers #150

Closed mhassan1 closed 1 year ago

mhassan1 commented 1 year ago

Currently, the implementations of the following operations return strings, but they should return numbers (for example, https://tc39.es/ecma262/#sec-utf16decodesurrogatepair):

ljharb commented 1 year ago

CodePointAt returns a Record, and strings basically are code points (the [[CodePoint]] field). What value would you expect there?

mhassan1 commented 1 year ago

The description of UTF16SurrogatePairToCodePoint (below) sets cp to a number then returns cp (unless we are supposed to interpret Return the code point cp. as Return the code point whose numeric value is that of cp.).

  1. Let cp be (lead - 0xD800) × 0x400 + (trail - 0xDC00) + 0x10000.
  2. Return the code point cp.

Maybe this is just a matter of interpretation of what a code point could be (a number or a string), but I thought it was supposed to be a number.

ljharb commented 1 year ago

I think either is certainly a reasonable interpretation - and yes, I interpreted that line as a transformation of the number, otherwise it would have said Return _cp_.

Do you have a use case where it makes a difference?

mhassan1 commented 1 year ago

I think part of the confusion is that String.prototype.codePointAt returns a number:

  1. Return 𝔽(cp.[[CodePoint]]).

For that reason, "code point" feels like a number, but I agree that the spec isn't explicit about that. FWIW, MDN also says it's a number, but I guess that's talking about the String.prototype.codePointAt result, not necessarily the abstract "code point."

I don't have a use case where it makes a difference. I just found it confusing while I was implementing a polyfill for String.prototype.toWellFormed. For example, UTF16EncodeCodePoint has this:

  1. If cp ≤ 0xFFFF, return the String value consisting of the code unit whose numeric value is cp.

Again, that could be interpreted as "the number representing the code point is less than."

ljharb commented 1 year ago

do note that https://npmjs.com/string.prototype.towellformed already exists :-p

indeed, <= in JS works on both strings and numbers, so that was my interpretation here ras well.

mhassan1 commented 1 year ago

Understood. Closing this. Thanks!

ljharb commented 1 year ago

Thanks for opening the discussion!