Open CraigMacomber opened 1 year ago
If we change this, we should revamp all string handling to be character-based instead of index-based for consistency. e.g. this today is also wrong by the same token:
type A = "𞥃";
type First<T extends string> = T extends `${infer F}${string}` ? F : never;
type F = Uppercase<First<A>>;
I'd like to understand more how you encountered this in real code
I'd like to understand more how you encountered this in real code
I'm working on a system which generates APIs based on schema (similar to what TypeBox does, but more specialized), I wanted to add methods with named derived from the names in the schema (like convert "foo" into "boxedFoo") since I know TypeScript can do this now (which is pretty neat and works surprisingly well).
There are two parts to this: the Types part using Capitalize
( https://github.com/microsoft/FluidFramework/blob/ddee06b429b26b571b7383d4acd95a62d3b9758c/experimental/dds/tree2/src/feature-libraries/editable-tree-2/editableTreeTypes.ts#L227 ) and the runtime part which I had to implement my own capitalize function for ( https://github.com/microsoft/FluidFramework/blob/ddee06b429b26b571b7383d4acd95a62d3b9758c/experimental/dds/tree2/src/feature-libraries/editable-tree-2/lazyTree.ts#L436 )
Since the field names in the schema come from users of our library, and I'm personally very aware of JS utf-16 string issues, I hoped the standard library would have a capitalize operation so I wouldn't have to deal with this. Searching for that lead me to https://stackoverflow.com/questions/1026069/how-do-i-make-the-first-letter-of-a-string-uppercase-in-javascript which I immediately realized was incorrect. I then wrote my own implementation without the bug (including tests covering that case), and wondered if possibly TypeScript had the same bug in their version. I checked and found that it did.
Since my implementation is correct (as far as I can tell anyway), and does not match TypeScript's behavior, a user of our Schema system that started a field name with a lowercase character that is two utf-16 code points would get incorrect type information for the generated API. As our library is still in the alpha stage and all our schema are in English, we absolutely haven't hit this, but its possible that we could some day.
🔎 Search Terms
Capitalize, utf-16, string, type
🕗 Version & Regression Information
Capitalize
)⏯ Playground Link
https://www.typescriptlang.org/play?#code/PQKhCgAIUhhBDADgSwC7wDbIF4FNLyQDOqATsgHYDmAdFCMOAGYCuFAxqsgPYWTtI0mHLgA8AZUi4AHqlwUAJkWJlKVAHwAKIgC5I4gJR6EKdFjwT1kAN7gAkMGCQAKtwIA3bsgXFEWVFzU-AAW8KTwnLikygDuwcjswQSk+AC28Ar43EyQqSwYXH74AKrOAGIAtACMAGz83JmQbGhEADT2jk1E+GhR8Fy8kJQkuBmQ2UOKMmqQqG7diGH9+KjB+EzI0aghS5GkdHbsvCRDcuFyPgC8xADa4gCeqQBG3Bg0vefcpAC6mgY0FBkqD+AG57MgcpoPssFDQFLx8JckbNSCxcAYbPYHE4AKKpRCoe4qchBATdA52FKoFikPgAIjpBGUJiE5jE4nUYLsAF9wPYqTS+FCzjCaO5MGiaHNiohEFEEN0-pAANTEGhELDsXDCvoXMUS3A0DDyKirAwY+DMwRmESWMG88BHCgnBT9eB6G4kElUVrEtTfG7fSDXG5QSDhm4M30M77tcMRunwOnRgCCdNjYYT8CTqaTGfjkE6ADleBVLexkMhM5BIwBH6TJyB0gCLDfz8c6ABluDEomT8MaAlFZqFtshlAAmSClSq1eqNZqoXTVyOALg3AMK7jbpq8AhLvpuPxtfrnNN3d59rfMFML6QTROk43SiIFioX0yOWRBRBiau9AY2wFveryGhg3BUJoAAGT4vnodIACTWNBqDco2KREPkqCwQhAimMIeBQhQz6oAYKFvtIH4XFh1jvrgn4oRBBj2ny4CEnKLi4CQogAJKES+UiyPISh+tQvo4uRtEXPxciKFauFstxvGoOoVjXCyNoWDxRGcsxrH4AA+lUwbsZxUZNnS2m6ZAelTtczgcagoiJluaYWfcbF6QAzEZdkmdmzlJq57kACzefZjn1lurbmWC4CdM4azdPwlocZA8IUAA5NsMRfAA1nQll6QArKFJkbluu7RSxbn6XUtlhdux7lTuAUgkAA
💻 Code
🙁 Actual behavior
Capitalize appears to capitalize
theString[0]
which is not a correct implementation due to mishandling of characters which require two utf-16 code units and thus take up two indexes in the string.For example,
Capitalize<"𞥃">
gives"𞥃"
;🙂 Expected behavior
When the first character in the string passed to Capitalize is in the Unicode "Lowercase Letter" category, that character should be replaced with its corresponding uppercase version.
For example,
Capitalize<"𞥃">
should give"𞤡"
;Additional information about the issue
The included code provides a tested correct implementation for runtime use. This implementation is taken from my work here: https://github.com/microsoft/FluidFramework/blob/ddee06b429b26b571b7383d4acd95a62d3b9758c/experimental/dds/tree2/src/util/utils.ts#L394
That code is available under the MIT license and already copyright Microsoft, so feel free to use it in the TypeScript implementation if you find it useful to do so.
This will be a breaking change, but I highly doubt it will impact real use.