Open sol opened 2 years ago
I'm sympathetic to the idea, but technically this is a breaking change.
@sol could you please gather community feedback on this change?
@Bodigrim I would love to help, however, even if I try hard, I'll not be able to free up time to mediate this issue.
If somebody else wants to pick it up, this might be useful:
The reason why I think this change is a net positive is that relying on the semantics of pure exceptions is inherently risky:
https://gitlab.haskell.org/ghc/ghc/-/issues/1171 https://gitlab.haskell.org/ghc/ghc/-/issues/2273 https://gitlab.haskell.org/ghc/ghc/-/issues/5561 https://gitlab.haskell.org/ghc/ghc/-/issues/5129
Read: If your program relies on the type of pure exceptions, it may produce different results depending on GHC version and optimization level. For that reason some people will advice that you should only ever use a partial function if you make sure that your input will not trigger an exception.
Observations:
This is a very benign breaking change. People push back against changes that cause unnecessary churn, but nobody depends on replace "" _ _
being undefined. I will try to collect a few more such changes so we can ask/notify about them all in one post (there is a similar "nobody should care, but do you" conundrum in improving foldl
).
I've confirmed that every occurrence of Text.replace
on Hackage (~50 packages) is ostensibly applied or meant to be applied to a non-empty string. Almost all of them are a literal or some obvious construction like Text.singleton _
or "foo" <> _ <> "bar"
, so they're quick to go through. The few other uses are a little less obvious, but it is still clear that the empty string is not expected; listed below for exhaustiveness:
replaceBy = replace
, which I couldn't find any use of on Hackage https://github.com/andreymulik/sdp4text/blob/12990eaa7b39ff1cad7bd52be26986fa0bd0a30d/src/SDP/Text.hs#L172I've noticed that breakOn
and breakOnEnd
also throw exceptions when their pattern is an empty string. Surprisingly this is not mentioned in the documentation and can only be inferred from the presence of HasCallStack
constraint.
I think it might be reasonable to make breakOn
and breakOnEnd
total. After all, breaking on empty string once can be treated as not doing anything - there's an empty string at the start and end of any string. So something like breakOn "" str = ("", str)
and breakOnEnd "" str = (str, "")
would make those functions total and probably shouldn't be too surprising.
There's also breakOnAll
which is also partial, but at least this one mentions that the pattern to break on may not be empty in the docs. However I think it could be usefully defined on empty pattern as well - just explode original text into individual characters, breakOnAll "" str = map singleton (unpack str)
. At first glance this even seems to be consistent to what's written in the docs of breakOnAll
, which is Find all non-overlapping instances of needle in haystack
- all non-overlapping occurrences of an empty string in 'gaps' between characters got broken on.
For comparison, splitOn
(also partial) has this to say:
String to split on. If this string is empty, an error will occur.
An empty delimiter is invalid, and will cause an error to be raised.
So I would suggest either making all of these total (my preference) or at least copying splitOn
's documentation.
Data.Text.replace
errors if the needle is the empty string. Every other implementation I tried matches the empty string against every position in the string. I suggest that for consistency with other languages and to avoiding partiality we adopt that behavior.Expected behavior:
Actual behavior:
Rust:
Python:
JavaScript: