tc39 / Function-prototype-toString-revision

:fishing_pole_and_fish: ECMA-262 proposal to update Function.prototype.toString
https://tc39.github.io/Function-prototype-toString-revision
26 stars 10 forks source link

Performance implications of line terminator normalization #19

Closed tschneidereit closed 7 years ago

tschneidereit commented 7 years ago

The line terminator normalization is quite unfortunate because it prevents engines from just handing out slices of existing sources, or at least simply memcpying the relevant part of the source. At least in the general case, and not without additional work during source scanning to detect \r characters.

@michaelficarra you introduced the normalization, so maybe you can explain the reasoning? I saw that it was discussed at a tc39 f2f, but unfortunately the notes are very terse, so I really don't understand why it's required.

(There seems to be an additional problem with template strings: it seems unlikely to me that we really want F.p.toString mess with \r characters inside them. Properly handling that would require parsing the source again - or keeping information about template strings around - IINM.)

michaelficarra commented 7 years ago

@tschneidereit That's a great question. As you suspect, it does have to do with templates. Templates normalise line terminators (even in the raw value!). See these lines from the spec

The TRV of LineTerminatorSequence::<LF> is the code unit value 0x000A.
The TRV of LineTerminatorSequence::<CR> is the code unit value 0x000A.
The TRV of LineTerminatorSequence::<LS> is the code unit value 0x2028.
The TRV of LineTerminatorSequence::<PS> is the code unit value 0x2029.
The TRV of LineTerminatorSequence::<CR><LF> is the sequence consisting of the code unit value 0x000A.

TRV determines the raw value and TV (which determines the computed value) defers to it for LineTerminatorSequences.

So the normalisation is for consistency with templates and so that a programmer can't observe a change in line terminator encoding (intentional or otherwise). Fortunately, you only need to store pointers into the raw source and can do line terminator normalisation when the toString result is requested. And if it's ever a perf concern, that result can be memoised.

tschneidereit commented 7 years ago

Templates normalise line terminators (even in the raw value!)

Oh, I wasn't aware of this. Thanks for the clarification.

So the normalisation is for consistency with templates and so that a programmer can't observe a change in line terminator encoding (intentional or otherwise)

I'm not sure I really buy that: there are various situations in which authors have to deal with this kind of potential inconsistency already, such as loading a script over XHR (or by whatever other means) as a string and the evaluating it. What's special about this situation that it requires the normalization of source text? And why isn't consistency with other ways of getting source text more important?

Fortunately, you only need to store pointers into the raw source and can do line terminator normalisation when the toString result is requested. And if it's ever a perf concern, that result can be memoised.

I'm mostly concerned about frameworks using F.p.toString for reflection purposes in which case it seems likely they'd only do the reflection once (one hopes), but need that to be fast. I also don't think memoization is a good idea here because of the increased memory usage.

michaelficarra commented 7 years ago

I'm open to hearing some pros/cons of line terminator normalisation. I also suspect many at TC39 would have strong opinions. For instance, I think it was @allenwb who originally pointed out that we must normalise line terminators. But I'll add an agenda item to the next meeting to give other implementors an opportunity to voice their concerns about it.

I'm mostly concerned about frameworks using F.p.toString for reflection purposes in which case it seems likely they'd only do the reflection once (one hopes), but need that to be fast.

I'm not really concerned about the reflection use cases. TC39 agreed that we should introduce proper APIs for reflecting on function parameter lists. Frameworks that reflect on parameter lists via F.p.toString will continue to work, but ones that want to be fast will move to the new reflection APIs.

edit: See https://github.com/tc39/agendas/pull/219

ljharb commented 7 years ago

One thing worth noting is that not normalizing line terminators might expose implementation-specific information (like, that it's running on windows, eg) - it seems like that might be a security concern.

tschneidereit commented 7 years ago

But I'll add an agenda item to the next meeting to give other implementors an opportunity to voice their concerns about it.

Thanks. If the current consensus remains, we'll cope with it of course. I just think it's unfortunate - and exchanges one kind of inconsistency for another, without much gain.

One thing worth noting is that not normalizing line terminators might expose implementation-specific information (like, that it's running on windows, eg) - it seems like that might be a security concern.

Can you elaborate on that? Without normalization we'd just return the (apart from WTF-16 encoding) unmodified source text. What's the information you think that'd leak? The JS engine running on Windows wouldn't change the returned string.

allenwb commented 7 years ago

(There seems to be an additional problem with template strings: it seems unlikely to me that we really want F.p.toString mess with \r characters inside them. Properly handling that would require parsing the source again - or keeping information about template strings around - IINM.)

Unescaped line terminators are already normalized by template literals. See note at the end of 11.8.6.1 (or the actual TV/TRV definitions)

jdalton commented 7 years ago

With issues like #16 and this I think this proposal may need some retooling. The current toString behavior has been dealt with for years so adding extra overhead to spec-ify and close-gaps-in the de facto behavior may not be so great.

tschneidereit commented 7 years ago

Unescaped line terminators are already normalized by template literals. See note at the end of 11.8.6.1 (or the actual TV/TRV definitions)

Right, @michaelficarra made me aware of that. See the additional discussion above, too.

michaelficarra commented 7 years ago

An update after this was discussed yesterday in the committee:

Action items:

I will bring this back up with the committee once I feel that we have enough additional input that we can advance the proposal; with or without normalisation.

ljharb commented 7 years ago

@michaelficarra given "Some committee members very strongly supported the normalisation, going as far to say they would block advancement of this proposal without it.", and #22, is the implication that you are confident you can persuade these committee members at the next meeting?

michaelficarra commented 7 years ago

@ljharb I plan to proceed with this proposal only in a state in which it will actually be implemented. If some committee members will block it in its current state, I will withdraw the proposal. There's no point in maintaining a specification that is completely disassociated from reality.