purescript / purescript-strings

String utility functions, Char type, regular expressions.
BSD 3-Clause "New" or "Revised" License
54 stars 71 forks source link

fromCharCode BMP #153

Closed jamesdbrock closed 2 years ago

jamesdbrock commented 3 years ago

fromCharCode should return Nothing if the code is out of the Basic Multilingual Plane Char range, right?

https://github.com/purescript/purescript-strings/blob/157e372a23e4becd594d7e7bff6f372a6f63dd82/src/Data/Char.purs#L16

>>> show $ fromCharCode 65900

(Just 'Ŭ')

The Bounded instance for Char says that “Characters fall within the Unicode range,” but the Char says “guaranteed to contain one code unit.”

MonoidMusician commented 3 years ago

Oh interesting, it appears this is actually the line at fault: https://github.com/purescript/purescript-enums/blob/170d959644eb99e0025f4ab2e38f5f132fd85fa4/src/Data/Enum.purs#L316-L318

It's using top and bottom for ints, not chars. I guess n >= toCharCode bottom && n <= toCharCode top might work?

String.fromCharCode just does (code) % 0x10000 on the code, so what you're seeing is 65900 % 0x10000 = 0x16C.

JordanMartinez commented 3 years ago

I've opened an issue in purescript-enum to track this. Should this issue be closed?

thomashoneyman commented 3 years ago

I think it’s reasonable it stays open until the upstream issue is addressed

JordanMartinez commented 2 years ago

Technically, we still need a release of that library and then a dependency update here.

JordanMartinez commented 2 years ago

PR ready for approval: #163.