Closed hdgarrood closed 5 years ago
I've taken a slightly different approach in the 0.12 branch as it is:
Data.String
and Data.String.NonEmpty
only export functions that are codepoint/codeunit agnosticData.String.CodeUnits
now exists, with all the relevant functions moved there (likewise for NES).CodeUnits
and .CodePoints
modules re-export the agnostic stuff tooThis means people will be forced to choose between CodeUnit and CodePoint functions at least, and it avoids the potential future problem of people not noticing the switch if Data.String
changed which set of functions it re-exports.
So for most people the migration path now will just be replacing import Data.String
with import Data.String.CodeXXX
.
Sound good?
I'd prefer it to re-export the functions from Data.String.CodePoints
from Data.String
to have a sensible default for import Data.String
. Codepoints and Codeunits are pretty technical terms that not everyone will know about and we should strive to make it easy to do the right thing (which means using CodePoints in this case).
@garyb While we're breaking things for 0.12, let's also make Data.Char.toUpper
and toLower
return a string. Sorry for derailing.
Sure! Have you got an example of where that happens? (just curious)
'ß'.toUpperCase()
"SS"
Nice, thanks!
I think we can just drop the Char
module actually - doing the case alteration can be done in String
form, so having a Char -> String
version is only a miniscule ergonomic win. There are no other functions in that module now, since fromCharCode
/ toCharCode
became toEnum
/ fromEnum
s instead.
I disagree with
fromCharCode
-> toEnumWithDefault bottom top
There's just no good way to discover that.
Hmm maybe... although it can be documented somewhere.
fromCharCode
was a lie, that function should always have been returning Maybe Char
which is how toEnum
works at least.
Now would be a good time to address that and have it return a Maybe Char
then, surely? I don’t think redundancy is necessarily bad.
I made the change to have Data.String reexport Data.String.CodePoints in: 1fbc4c0cf0fb816870a6841fa83c5cbdcddaaf22
My understanding of what we have now is:
Data.String.Common
contains functions which behave in the same way regardless of whether we are considering strings as sequences of code points or code unitsData.String.Code{Points,Units}
contain functions whose behaviour differs based on whether we are considering strings as sequences of code points or code unitsData.String
re-exports the entirety of Data.String.Common
and Data.String.CodePoints
If this is correct, I'm happy and I think we can close this?
Yeah, that's right 👍
@michaelficarra originally suggested this and I agree; I think
Data.String.CodePoints
should really be the default. Unless you're certain you won't be working with anything outside the Basic Multilingual Plane, and you've identified string manipulations as a performance bottleneck, you should really be using the functions inData.String.CodePoints
.For the functions whose type signatures are the same across both modules, like
length :: String -> Int
, this has the potential to be quite problematic, so I think we need to be quite careful about it. I'd suggest the following:Data.String.CodeUnits
, with the exact same exports as the currentData.String
,Data.String
, detailing that the functions within currently operate on code units, not code points; that this will change in the next breaking release; and that you should very probably be usingData.String.CodePoints
instead (unless you are sure you want to operate on code units, in which case you can useData.String.CodeUnits
)Data.String
so that it re-exports everything fromData.String.CodePoints
Data.String.CodePoints
module, for removal in a subsequent breaking release?