Open lucacasonato opened 3 years ago
What about encodeInto()?
The exact name of the method should be something different though - the static method and prototype method would name clash in this example which is confusing.
Why would that be confusing?
What about encodeInto()?
Added - forgot it existed.
Why would that be confusing?
Just checked spec - wouldn't just be confusing but also against webidl spec:
"The identifier of a static operation also must not be the same as the identifier of a regular operation defined on the same interface." - https://heycam.github.io/webidl/#idl-operations
As to why it would be confusing: "just call TextDecoder.decode" would have two meanings now. Also what would for example the MDN link be? https://developer.mozilla.org/en-US/docs/Web/API/TextDecoder/decode is already taken for the prototype method.
We can fix Web IDL.
As for MDN's problem, they've had a big issue with bad documentation for prototype methods for a long time. They need to fix it. TextEncoder.decode()
has always meant a static method (which hasn't existed); TextEncoder.prototype.decode()
or textEncoder.decode()
are how you write the instance methods.
It's unclear what the benefit of the proposed change is. It doesn't save significant typing (just new
); in cases where you want to avoid allocating a new instance, the encoder/decoder can be cached and used statelessly.
Given a time machine to redesign the API now that streams are a thing, this makes more sense for non-streaming encodes/decodes, but is it worth adding to the platform?
If we add API surface that doesn't work in old browsers, I'd like to know if users of the potential API surface care about non-UTF-8 encodings. If not, I'd prefer adding new API surface to JavaScript String
itself instead of TextDecoder
, since the WebIDL layer is a perf bottle neck.
String
from Uint8Array
assuming UTF-8 semantics.String
from Uint16Array
assuming WTF-16 semantics.String
that writes UTF-8 into a Uint8Array
with the encodeInto
semantics but returns an array of numbers instead of a WebIDL type.String
that writes WTF-16 into a Uint16Array
.cc @bakkot @littledan
https://github.com/whatwg/webidl/pull/1098 should technically unblock this. Are any browser vendors objected to this? If not, I'd like to start work on this.
since the WebIDL layer is a perf bottle neck
@hsivonen How so? In Deno we have an optimized path in our WebIDL bindings for this specific API that bypasses the slow USVString conversion completely. This should be possible in other implementations too.
During encode you can convert the first argument to a DOMString (which is fast), and then just perform a lenient UTF-8 encoding on that string, which will already perform lone surrogate conversion. https://github.com/denoland/deno/blob/main/ext/web/08_text_encoding.js#L166-L175
since the WebIDL layer is a perf bottle neck
@hsivonen How so? In Deno we have an optimized path in our WebIDL bindings for this specific API that bypasses the slow USVString conversion completely. This should be possible in other implementations too.
Gecko doesn't actually use USVString
in TextEncoder
WebIDL, either. However, the WebIDL layer still has enough overhead that e.g. wasm-bindgen
conditionally avoids going through the WebIDL layer.
Gecko doesn't actually use USVString in TextEncoder WebIDL, either. However, the WebIDL layer still has enough overhead that e.g. wasm-bindgen conditionally avoids going through the WebIDL layer.
Mh, that seems very weird to me. I very much doubt that WebIDL inherently needs to be a slowdown here. Currently the overhead likely stems from most engines slowing down when calling from JS to native code. This is a performance issue that can be fixed. In V8 based runtimes, TextEncoder.encodeInto
can use V8's "fast API" for better JIT inlined performance for example.
The only WebIDL conversion taking place here is the DOMString
conversion, which really is just a call to ToString which is a 0 cost op if the call is already a string. A native API in ES would also need to do some runtime type checking (likely the exact same checks), so I really don't see why performance should be a reason to have a native JS API.
TLDR: I don't think spec decisions should be driven by current performance issues in engines, as I don't think that there are any technical reasons for why TextEncoder#encodeInto
must be slower than String#encodeIntoUTF8
. It's just a matter of optimizing the codepaths as far as I can tell.
Please correct me if I missed something obvious here.
The large majority of users of
TextEncoder
andTextDecoder
do not make use of streaming capabilities. Instead they just need to decode a single chunk, or encode a single chunk. We should consider adding a static utility method toTextEncoder
/TextDecoder
to cater to this use case.I propose the following API be added:
The exact name of the method should be something different though - the static method and prototype method would name clash in this example which is confusing.
Internally this would behave exactly like a single use non streaming encoder.