Open ljharb opened 4 years ago
This is a really difficult issue. Both sides have good reasons. So I slightly support leaving it to follow-on proposals.
Given the decision with .at
on String.prototype, there's an even stronger precedent that we shouldn't be trying to piecemeal "fix" code units, which to me suggests very strongly that if this syntax is to advance, it would need to work on strings, as code units, just like .slice
does.
I don't agree. The decision of .at()
on String.prototype actually are forced to choose code unit because you insist that it must include string with code unit semantic, or you will block the proposal (if i remember wrong, pls correct me), and some delegates complain about that in irc. I don't think it's a fair action that use that as the "stronger precedent" 😉
Anyway, it will be very hard for the champions to design a solution which can "fix" code units if delegates like you refuse any trade-off.
Precedent here means "what's in the language", regardless of the reasoning - and yes, you're remembering correctly.
This proposal isn't trying to fix code units, so it shouldn't be designing a solution to attempt to do so.
regardless of the reasoning
So eventually the toughest guy in the room can decide the direction of language. Thank you for telling me the truth. 😜
That's a very unfair and inaccurate interpretation, and it is not a good look for you. I'm going to ask that you please stop this kind of discourse.
So pls teach me how should I interpret that? You insist on on a specific design or you block the whole proposal, so everyone who don't want block the whole thing can only accept that design, so it become "what's in the language". Then next time when we face similar issue, you could use it as "stronger" precedent.
Please don't treat my argument personally. It's more about the process and how we could allow the space of tradeoff.
This issue, and this repo, isn't about the process, and it's not really appropriate to discuss that here.
I don't think slice notation is worth the syntax weight unless it's usable everywhere i currently use .slice
. Remember that the easiest decision for all of us to make is "nothing" - change requires justification and consensus, the status quo does not.
So let's go back the technical issue and measure the weight of each side as we can.
I think it's a good point that " it's usable everywhere i currently use .slice
", but please don't close the door of possible "fix" of code unit issue, because some people also think "it's not worth to add new string feature which keep the code unit issue".
Of coz such "fix" will be very hard to design and must have some tradeoffs, but at least please not reject it without any discussion.
Many delegates objected to including .at
on strings without it having code unit semantics. The only viable options (based on delegate responses) for any proposal on indexables are "don't include it on strings" or "include it with code unit semantics". I've stated a strong preference for the latter.
include it with code point semantics
typo?
whoops, yes thanks
@ljharb I don't recall anyone other than you strongly required at()
or indeed any string operations to be based on code units. Perhaps @bakkot remembers (since he seems to be the other party in the debate).
Overall, I struggle to understand the logic behind such a stringent requirement, especially since string operations are not 'consistently' based on code units:
at
proposal was based on code points.codePointAt
method is based on code points, not code units (obviously)....string
, etc., are based on code points.Regarding string slicing, in practice, it's frustrating that it will inevitably split surrogate pairs with a small but unavoidable probability, turning a well-formed string into non-well-formed. Not to mention that developers can easily fall into this trap (which usually only affects international users, making this choice inevitably smack of 'racism'), even if a developer is aware of this issue, the effort to resolve it is disproportionate to the benefits, leading to almost all programmers (including myself) choosing to ignore this issue. If a programming language's existing APIs are unfriendly to internationalization due to historical reasons, that might be excusable. However, perpetuating this historical mistake in all new features of the language would be like shifting the responsibility of this mistake to all programmers, which I believe is extremely bad and the wrong choice.
Therefore, I hope to find a balance where, on one hand, the index is inevitably based on code units, but that doesn't mean we must split surrogate pairs. We could consider always keeping surrogate pairs intact. Of course, this requires very careful design to avoid breaking some of the implicit constraints that might exist in real-world string operations (such as ensuring s.slice(0, i) + s.slice(i) == s
is always true). It's not easy, but it's not impossible either, and at least it's worth a try.
In any case, under the current circumstances, I think the prospects for advancing slice notation are slim, but perhaps this gives us an opportunity to discuss more generously how to provide JavaScript programmers with better string slicing behavior. If in the end we don't have slice notation, at least we could have a better API?
If it's slice notation, then it should do what slice
does, full stop.
I agree that it's very unfortunate that JS strings default to code units, but that's still the way they work. Code points isn't particularly helpful either, given that grapheme clusters tends to be what people actually want ime.
Strings are indexed by code units, so slicing should work on code unit indices. codePointAt
takes a code unit index. These are low-level string operations, on top of which text operations may be built.
I read https://github.com/tc39/proposal-slice-notation#should-slice-notation-work-on-strings , but I’m not convinced that the syntax shouldn’t work on strings. The string slice method is used extensively, in modern code bases, unicode issues notwithstanding. Why wouldn’t these use cases deserve robust syntax sugar?
(one of the great things about slice is that it’s the same method on both strings and arrays; it’d be a shame to lose that)