Open Kleidukos opened 3 years ago
💯 from me. Historically, these types of functions were not considered for the codebase due to the author disagreeing with them, resulting in everyone implementing them on an ad-hoc basis. But, I happen to know that now, the author has absolved himself of responsibility, and we can push ahead with doing it. I haven't put too much thought into the string conversion ergonomics beyond the most convenient function we all implement:
tshow :: Show a => a -> Text
tshow = pack . show
I'm all ears on making this a feature.
This is very true, we have seen an abundance of ad hoc implementations of ToText
.
While I agree that something of this flavour is useful, I'm not convinced that it should produce a Text
. Afterall, almost all uses of tshow
in my experience are as part of a concatenation of other textual values. Moreover, an efficiently-implemented tshow
would almost certainly be constructing the its result as a concatenation of individual parts.
Is this issue to discuss a potential design, witnessed by relude
, or to bikeshed a (mostly-)new one?
@bgamari, relude
provides show :: (Show a, IsString b) => a -> b
, which happily produces a Builder
if you wish one.
@bgamari interesting point, and that's true. Grepping through my codebases, the most-used utility is sshow :: (Show a, IsString b) => a -> b
, and always in the context of concatenation.
@Bodigrim Having used the relude
conversion functions in production systems, I am very satisfied of its ergonomics and features.
My ticket was to implement it as-such in the Text library.
@Bodigrim I'll admit that I'm not a fan of such doubly-polymorphic conversion functions as they suffer from very poor inference. Of course, this is just a matter of style; I am free not to use the function if text
provided it. However, I do worry about the ambiguous-type errors that beginners will encounter. In my experience, such errors have been a common sticking-point for beginners working in code written with OverloadedStrings
.
@bgamari this is no worse than fromIntegral
. I would argue that show :: (Show a, IsString b) => a -> b
is not that ambiguous, even in presence of OverloadedStrings
, unless its client (logging, console, etc.) is polymorphic, which usually is not the case.
@Kleidukos consider the case of Data.Aeson.object [ "key" := "value" ]
with OverloadedStrings
enabled.
@bgamari Okay thanks :)
I like the design of string conversions in relude
. I'm not quite sure about LazyStrict
, which looks a bit ad-hoc, but ConvertUtf8
, ToText
and ToLText
seem to be natural additions to text
.
I would like to hear from @chshersh and @vrom911 how do they feel about integrating their design into text
.
@Bodigrim LazyStrict looks actually quite cool, with the fundeps. I like this design a lot.
As someone who (unwillingly) has to use an alternative prelude with exactly such a magic conversion function, I am strongly not in favour of it. It often turns type resolution to custard, and hides disgusting amounts of allocation and copying. Driving this via some kind of lawless type class is also not my idea of a good design.
Once again, 100% agree that the ergonomics of conversion are poor. However, I don't think this is a good solution.
re LazyStrict
there is https://hackage.haskell.org/package/strict-0.4.0.1/docs/Data-Strict-Classes.html which e.g. lens
build upon since lens-5
. (I.e. it already exists, text
doesn't need to invent it)
There is also https://hackage.haskell.org/package/string-conversions-0.4.0.1/docs/Data-String-Conversions.html if you want two way conversion. Thing like show :: (Show a, IsString b) => a -> b
, why would it be in text
, what make text
the place for it, and not, I don't know, prettyprinter
with its Pretty
class!
I agree with @phadej on both counts. If we do have nice conversions, I think I would prefer them to be text-valued, as opposed to generally convertible.
As someone who (unwillingly) has to use an alternative prelude with exactly such a magic conversion function
@kozross Could you tell us a bit more about the prelude you're using and the conversion mechanism you're talking about?
@phadej Yes I agree with you, and this was not part of my original proposal. :)
@Kleidukos My gripe was with the extremely polymorphic show
(which as you clarified, isn't part of your original proposal). The one I have to deal with is from universum
, which is even worse.
I am strongly against basing conversions (or indeed, anything) on lawless type classes. That is what relude
appears to do based on what you linked. This is definitely not an idea I support.
Okay, right now the main blocker to adoption seems to be that the typeclasses have no laws. Can't we make them?
Not really - the whole point I'm making is that type classes are a bad vehicle for this kind of idea. The way the Relude stuff is written is basically impossible to give any laws to, as it amounts to 'I can turn this into a Text
... somehow'.
I think a much better example of what we should go for istshow
. Everyone keeps writing it in their codebases, and providing it would be useful, minimal, and involve no lawless anything.
I'm not entirely sure what input is required from us. We use the ToText
typeclasses with other conversion functions from relude
in various projects, and we find them highly convenient. We don't experience problems with interaction between OverloadedStrings
and relude
conversion functions. Moreover, this extension is enabled in our all projects by default in the .cabal
file.
I also don't see any problems with typeclasses without laws. We already have Show
, Read
(which even fails in runtime with error
), FromJSON/ToJSON
, FromField/ToField
from postgresql-simple
, Arbitrary
from QuickCheck
, and many more. Even if they don't have laws, typeclasses are convenient, and they help a lot of people be productive and solve their problems efficiently, even if they don't have laws.
I don't think that having or not having laws should be the blocker to introduce typeclasses. There're different trade-offs for using typeclasses vs using explicit functions. But since it's possible to have only a single instance per type, global instance coherence plays a more crucial role. When introducing a typeclass, you can ask yourself a question, "How many instances of this typeclass could be for a type?". And for me, there's only one way to convert String
to Text
using the ToText
typeclass, so typeclasses look suitable enough.
The way the Relude stuff is written is basically impossible to give any laws to, as it amounts to 'I can turn this into a Text... somehow'.
I wouldn't be so definitive in wording by saying that it's impossible. I see at least one law for the ToText
typeclass:
toText . toText ≡ toText
And this law holds for all current instances. But even without this law, I think that the ToText
typeclass if reasonable.
On the technical side, I have only one concern. If the ConvertUtf8
typeclass won't make it to text
, but something like ToText
would, then the wording of the custom error message for the ToText
instances can prevent us from reexporting ToText
from relude
instead of using our own typeclass. Our custom error messages mention decodeUtf8
from ConvertUtf8
but if text
would refer to decodeUtf8
from text
, then we can't just reexport typeclasses, at least because the implementation of decodeUtf8
is different in text
and relude
.
According to Bodigrim, ConvertUtf8
should make it into text
, so I'm not excessively worried about that.
And yes, we have many lawless typeclasses in Haskell. Now it may be because typeclasses are one polymorphic abstraction that is open to be extended, but it doesn't mean that it is untestable.
One law that is not algebraic but that we can promote is that the ByteString
/LByteString
/ShortByteString
conversions are a no-no, and that, quite fortunately, is taken care of by the EncodingError
system.
The way the Relude stuff is written is basically impossible to give any laws to, as it amounts to 'I can turn this into a Text... somehow'.
I wouldn't be so definitive in wording by saying that it's impossible. I see at least one law for the
ToText
typeclass:
- Idempotence:
toText . toText ≡ toText
And this law holds for all current instances. But even without this law, I think that the
ToText
typeclass if reasonable.
While I also don't think that typeclasses have to have laws, and I really like toText
from relude, this particular law will always be satisfied, as toText . toText ≡ toText @Text . toText ≡ id . toText ≡ toText
due to toText @Text ≡ id
.
I also find those three classes reasonable enough (ToText
, ToLText
, ConvertUtf8
).
I think there can be other ways than laws to easily grasp what is going on with a type class. In this case, there are many morally equivalent representations of "text" (String
, Text
, LText
, also Builder
and ShortText
), and the functions ToText
/ToLText
reify that equivalence for situations where you just want to normalize everything down to the same type. What matters is that these types are all "Unicode strings", so toText
does essentially nothing on some semantic level, and that's why it's reasonable. (One possible blind spot is IF any such conversion also requires some kind of encoding normalization, I hope that's not the case.)
Similarly, ConvertUtf8
is intended to be the way to convert in UTF-8 between any type of "bytestring" and any type of "text". Again, the idea is that all instances implement one well-defined semantics (that of UTF-8) in some intuitive sense.
I think what would be problematic is if UTF-8 conversion may or may not be performed depending on the inferred types, but that's not what @Kleidukos proposed here.
Another low-tech approach is to expose all those conversions as monomorphic functions, but decodeUtf8FromByteStringToText
doesn't necessarily make things easier to read.
The argument that 'we have lawless type classes already, what's a few more' is pretty bad, especially when the existing examples given are some of the most problematic type classes in the ecosystem (not only for their lawlessness). If you see litter, the solution is at least not to add more litter, not drive by with a dump truck. This is particularly pertinent here, given that text
is a boot library.
I am in favour of simple, monomorphic conversion functions. Dragging in type classes to solve this problem is the wrong approach, especially considering that UTF-8 converter type class will absolutely shred type inference without littering your code with explicit types or @
-nnotations. Yes, it means people have to write slightly longer names - so what? If that means clarity and type inference, I'm in favour. Making things short and automagic by way of lawless type classes (especially ones with multiple parameters and no fundeps) is a false economy - you're trading (arguably justified) inconvenience now for massively worse inconvenience later.
I'm not speaking theoretically either - I'm knee-deep in several codebases at my Real Job For Real Money which decided on this false economy. The number of hours of my life I've lost on utterly unproductive tasks as a result of this is so high that I cannot agree with such a design even in theory.
will absolutely shred type inference without littering your code with explicit types or
@
-nnotations. Yes, it means people have to write slightly longer names - so what?
Don't these expressions convey exactly the same information?
(decodeUtf8 :: ByteString -> Text)
decodeUtf8FromByteStringToText
You're claiming that the latter variant is better; what's the problem with the former? Is a "slightly longer name" more than an ad hoc form of type annotation?
My own argument above was explicitly not "look at those other lawless type classes", but "these classes we are discussing here may make sense even without equational laws", so I'd like to know where that specific argument goes wrong.
To be clear, I am still ambivalent about whether those type classes are a good idea. I am willing to respect your informed opinion on this, but your argument as written is "I've been bitten by lawless classes before therefore lawless classes are bad" which is a logical fallacy. Elaborating that point would make your case more convincing.
@Lysxia I think I was a bit unclear in my argument - sorry! Let me clarify.
The issue I have with the UTF-8 conversion type class proposed here isn't (entirely) to do with its lawlessness. Likewise, the thing I was bitten by wasn't (entirely) lawlessness. The far bigger problem for me has to do with the fact it's a multi-parameter type class without functional dependencies. This is a known bad idea in the community - even the author of MPTCs said as much! - and it's not nearly as simple to resolve as you suggest in your response. I've had cases where even completely known monomorphic uses of a method from such a type class still required me to @
-nnotate to say what instance I needed!
This is not to say I support lawless type classes (I don't). It's clarifying that my statements above were primarily with regard to MPTCs without fundeps. My work-related anecdotes regarding lawless type classes are still bad; the ones regarding MPTCs without fundeps are orders of magnitude worse. They're hard to diagnose, the output of the compiler is confusing and misleading, and they are generally terrible on every level. Even if you accept lawless type classes as a valid solution to anything, you should reject MPTCs without fundeps - they're not worth the cost later, no matter how convenient they appear.
Hopefully this clarifies my position.
Thanks for clarifying. So it's more about MPTC than lawlessness, but I'm still not sure how to understand this issue you mention
I've had cases where even completely known monomorphic uses of a method from such a type class still required me to
@
-nnotate to say what instance I needed!
decodeUtf8 :: ByteString -> Text -- (1)
decodeUtf8 @ByteString @Text -- (2)
decodeUtf8FromByteStringToText -- (3)
AFAICT (1) and (2) are completely equivalent modulo getting the order of arguments right in (2), and there is no situation where you'd need (2) where you couldn't also have used (1). Do the problems you mention involve PolyKinds or ambiguous kinds? They are the only relevant circumstances I can think of, and they do not apply here.
And why are type annotations ((1) or (2) or both) bad? You've said that but not explained it. To me, none of the options above look significantly better or worse than the others.
The only difference I see is that in (1) and (2) you can sometimes omit the type annotation, which can lead to problems when a typo leads to wrong code compiling with the wrong instance. However, I've argued above that this should not be a problem since all instances "morally" do the same thing. The kind of mistakes that class may allow is on the same level as mistyping ...FromLazyByteString...
instead of ...FromByteString...
, which can happen just as well.
Abusing MPTC or lawless classes may lead to disaster. I am arguing that this proposal has good properties that make it not misuse and it will not lead to disaster.
The problem doesn't lie in the 'accidentally use wrong instance' issue. It lies in that if you omit the type signatures or @
-nnotations, the error you can get from the compiler, even in a completely monomorphic setting, is not in any way, shape or form clear to someone who doesn't have the mental pattern of 'if I have a fundepless MPTC, this is what resolution failure looks like'. I encountered this exact issue about a week ago; I'm a fairly experienced Haskeller, and am used to dealing with GHC gore around impredicative instantiation and type-level programming, and it took me quite a while to figure out that this was the problem, not least of all because how nonsensical the error you get is relative the situation you find yourself in.
Furthermore, I reject the idea that the instances do "morally" the same thing - depending on which conversion you are doing, you might end up doing a lot of allocations versus possibly very few. This is not a trivial concern, as this can easily lead to performance issues. Having this magically hidden behind a type class is not particularly helpful. In fact, I had to chase down just such an issue in Real Job For Real Money. I won't mention how annoying it was or how long it took - you can probably guess by now.
I believe that in the case of text
, these are not trivial concerns. It is probably one of the first libraries a newcomer to Haskell is going to need to learn. Forcing people to learn about the specific heuristics for detecting MPTC-with-no-fundeps resolution errors, as well as what conversions copy and what don't when hidden behind a type class method, does not strike me as a good experience for someone learning to use this library. I'd go even further and say that even for people who aren't learning, these problems are annoying and unnecessary.
And why are type annotations ((1) or (2) or both) bad? You've said that but not explained it. To me, none of the options above look significantly better or worse than the others.
Well, to provide a type annotation to decodeUtf8
, it must be wrapped in parentheses, so (decodeUtf8 :: ByteString -> Text)
. 34 characters, 6 of which are non-alphanumeric and require holding the shift key - this isn't ergonomic to write, and it's a lot of syntactic noise and punctuation. If you need to add a type signature after initially writing an expression, you need to insert the beginning paren, jump to the end of the word, insert the signature, and close paren. Not fun!
decodeUtf8 @ByteString @Text
is better - only 28 characters, and only two special characters. No need for parentheses. Much less line noise. To insert it after the fact, you just insert the type parameters. Passing type parameters has a UX concern in that it's not clear what the source or target is. from @Text @ByteString
- what exactly is happening here? You have to memorize it. And documenting type variable ordering isn't something that Haddock does a great job with yet.
decodeUtf8FromByteStringToText
is 30 characters, but no spaces and no special characters. Easy to type, no syntactic noise, and the name says exactly what it does - from bytestring to text.
That's the static view. Let's take the dynamic view - what happens when things change? What about typos and other mistakes?
As kozross mentioned, you can omit the type signature. This can give bad error messages, or worse yet, silently change behavior when another part of the system changes.
Autocomplete works better with the single function name than with a type-application or type-signature based approach. I can type decodeUtf8<TAB>
and it'll bring up all the options for things starting with that, and I can easily pick the term I want from the menu.
If you forget to import a value that's supposed to be used with a type application, you get a bizarre error message (or, at least, you did on GHC < 8.10, that appears to be fixed).
Now, all these UX concerns are with a two-param class: class Convert a b where ...
. But I've had much better luck with single type variable classes: class ConvertToText a where convertToText :: a -> Text
. These have much better usability, and by picking the output type, you don't run into the 'polymorphic return' type errors. There are more type classes - one per target type - but this still reduces the API surface area by only requiring a name for each target type, vs a name for each target * each source.
However, this still runs into an issue! Morally, convertToText :: ByteString -> Text
is a lie - we ought to be specifying the encoding, and then it should really return an Either ConversionError Text
, not Text
. But convertToText :: LazyText -> Text
is totally fine. Which means we want the signature to vary based on what we're doing - we're conflating parsing with conversion. And that's a big problem.
I'm not in favor of the conversion type classes. They're OK for rapid prototyping and playing around, but I'd rather not ever see them in production code.
Thanks @kozross and @parsonsmatt for making your concerns more concrete.
we're conflating parsing with conversion. And that's a big problem.
I completely agree with this. Are you still opposed to such classes if they are kept separate (as in the actual proposal here)? If we forbid ourselves from using MPTC there is still the option of offering a class ConvertUtf8ToText
separate from ToText
.
@Lysxia Yes, I am still opposed. @parsonsmatt spelled out some concerns, which I share, along with the inherent lawlessness of such classes.
Thanks for spelling that out, @parsonsmatt. To piggyback on Matt's ideas, I've considered writing the "moral" convenience classes in the past to help expedite non-production code like test suites. By "moral", I mean MPTC with functional dependencies, a la
tshow :: Show a => a -> Text
tshow = Data.Text.pack . show
-- this can be more complicated and deal with conversion errors too. Just an example.
data Utf8Error = DecodeError Text | ConversionError Text deriving Show
data Encoding = Unicode | Utf8
class HasTextRepr (enc :: Encoding) bs t | t -> enc where
type TextRepErr enc :: *
toText :: bs -> Either (TextRepErr enc) t
fromText :: t -> bs
instance HasTextRepr 'Utf8 ByteString Text where
type TextRepErr 'Utf8 = Utf8Error
toText = first (DecodeError . tshow) . decodeUtf8'
fromText = encodeUtf8
П> toUtf8Text :: HasTextRepr 'Utf8 a Text => a -> Either Utf8Error Text; toUtf8Text = toText
П> toUtf8Text @ByteString "foobar"
Right "foobar"
But, modulo syntactic or ideological errors in the code above, this has a knock-on effect that makes it a poor choice. Namely, the fundep will propagate throughout the code, and it makes type checking/inference, though good, grind to a halt for any sufficiently complicated program. This is compounded by the fact that Text is used ubiquitously, in many disjointed places. In fact, for some very polymorphic code I was using at one point that did something like this, with enough proxying and an arguably tiny amount of code (~400 lines), I managed to get GHC to OOM in stackage: https://github.com/emilypi/base64/issues/31. This leaves alot to be desired with respect to GHC's compile story.
The morality of why MPTC should always have fundeps is illustrated in Mark Jones' paper Type class with Functional Dependencies. I don't think I would be comfortable providing a non-fundep'd MPTC in the presence of other extensions like UndecidableInstances
, IncoherentInstances
, or even FlexibleInstances
, which can be a vector for some very nasty bugs.
In summary, we have the following situation:
string-conversions
. However, I do like the encoding I have above, because it accommodates base encoding newtypes as well as bytestrings, ropes, and other manner of stringy types. class HasTextRepresentation a where
toText :: a -> Text
and get away with some good stuff. However, this does obscure the encoding, and force us to newtype alternative reps (not necessarily a bad thing!) such as Ascii
, Utf8
and Utf16
. But it also doesn't address good errors.
Show
and Bytestring
conversions and that's fine too. I'm definitely in favour of @emilypi 's answer, maybe also with a tread
/ treadMaybe
/ treadEither
helper.
But then it becomes "how many treads do we need?", so maybe we need something that looks a bit like (this won't work, but to get the gist):
class IsString s => Readable r where
sreadThrow :: s -> r
sreadEither :: s -> Either e r
sreadMaybe :: s -> Maybe r
class IsString s => Showable r where
sshow :: r -> s
and then have Text, ByteString, String etc instances for those classes, so you can just sshow
/ sreadThrow
your data type and have it in any IsString
you like.
Then whenever someone wants a new choice class, they can just have:
class (IsString s, Readable r) => MyReadable r where
sreadChoice :: s -> Choice a r
(whichever way round the s and the r goes depending on what they want).
While I agree that something of this flavour is useful, I'm not convinced that it should produce a
Text
. Afterall, almost all uses oftshow
in my experience are as part of a concatenation of other textual values. Moreover, an efficiently-implementedtshow
would almost certainly be constructing the its result as a concatenation of individual parts.
@bgamari I ended up following your advice, in the text-display
library, the typeclass methods produce a Builder
, and the display
function just converts the Builder
amalgamation to a Text
. :)
There is also a detailed explanation regarding ByteStrings & Function types: https://github.com/haskell-text/text-display/blob/main/DESIGN.md#-you-should-not-try-to-display-bytestrings
I'm opening this issue due to @Bodigrim's advice:
The people at Kowainik (cc @vrom911, @chshersh) have produced very good ergonomics when it comes to string types conversion. Functions like
toText
,toLText
,toString
, etc. Even risky conversion is taken care of, with errors (https://hackage.haskell.org/package/relude-0.7.0.0/docs/Relude-String-Conversion.html#t:ToText).I think that this mechanism has its place in
text
, since it depends onbase
(forString
), andbytestring
.