Open Bodigrim opened 2 years ago
text-2.0
is probably fine, as it's the same ByteArray#
underneath. Older text
with UTF-16 is unnecessary conversion, but OTOH using newer Cabal
on older GHCs is somewhat special case (custom build-type, hackage-server
- there memory usage matters as all data is in-memory, ...).
Yet, for @Bodigrim and others, I'm no longer a maintainer of (nor contributor to) Cabal
, and there are plenty of people on the team, so I'd value if you don't ping me, especially as the first person.
Even text-1.2
with UTF-16 is better than conversions to String
here and there. But yes, I suppose cabal > 3.8 will be built with text-2.0
, so we can have both.
It's also easy to add conversion from Cabal
s ShortText
to Text
(in text-2.0
) without breaking any API, and then let users like hackage-server
and ghc-pkg
adapt first. cabal-install
doesn't really care about ShortText
fields (except to print info, which barely anyone uses).
https://github.com/haskell/hackage-server/issues/600#issuecomment-315625369 is the hackage-server
comment I remembered. While changing from UTF-8 to UTF-16 won't be a big change (making e.g. each PackageName
take twice as much space), I'd put the burden of showing that it's fine to the original proposer... Or migrating hackage-server
to use text-2.0
first. (Though still each Text
takes 16 bytes more space for each PackageName
etc, which matters for these plentiful and short text strings).
I experimented with this idea. Two conclusions:
We should rather keep ShortByteString
as is. It's indeed painful to implement something like a Unicode-aware uncons :: ShortByteString -> Maybe (Char, ShortByteString)
, but for Cabal purposes it's enough to have unconsAscii :: ShortByteString -> Maybe (Word8, ShortByteString)
and similar for other utilities. If we could set bytestring >= 0.11.3.1
, we'd have access to a wide range of ShortByteString
routines.
All package / module / unit names end up being converted to a type FilePath = String
. To avoid useless conversions, we must migrate to filepath-1.4.100.0
first.
Altogether, I think this ticket is on hold until GHC 9.6.
Thanks for the update!
If we could set bytestring >= 0.11.3.1 Altogether, I think this ticket is on hold until GHC 9.6.
No. Five years after that (emphasis mine)
Our GHC support window is five years for the Cabal library and three years for cabal-install: that is, the Cabal library must be buildable out-of-the-box with the dependencies that shipped with GHC for at least five years.
I'm afraid Cabal already violates this policy: https://github.com/haskell/cabal/blob/207d4ee08b929ae71ae2c746fe6da660bc129f05/Cabal/Cabal.cabal#L51
That is fixing a (nasty) bug. Depending on newer bytestring
can wait.
bytestring
is tangential. It would be nice to reuse bytestring-0.11.3.1
, but it's not too much to copy-paste necessary functions. What is really important is to switch to filepath-1.4.100.0
, but we need to accumulate more community-wide experience with a new interface first, which is not happening before GHC 9.6 is released.
I'm not saying (and I'm in no position to say) that we should pull the trigger immediately after GHC 9.6, it's simply that any further substantial progress requires it.
Relatedly: The FilePath
s in BuildInfo
etc data-structures respresenting .cabal
files are not PosixPaths nor WindowsPaths but some common denominator (more closely to PosixPath
though), and preparation to that can be started already today.
(In fact, that what I tried to with SymbolicPath
stuff).
I'm afraid Cabal already violates this policy:
Does it? This is under if impl(ghc >=8.2)
.
Unless you understand "buildable out-of-the-box with the dependencies" as in, literally, using all those deps? Isn't this about overriding without conflicts and crashes?
https://github.com/haskell/cabal/blob/0a2e68cbde9ad8dafd67c18511e7117956060f9b/Cabal-syntax/src/Distribution/Utils/ShortText.hs#L95-L96
Most of Cabal textual data (
PackageName
,ModuleName
, etc.) are newtypes overShortText
. This provides for compact storage, but almost any operation on them requires conversion toString
and back. Indeed, working with Unicode inShortByteString
directly is painful.Could we replace
ShortByteString
byText
inShortText
?This would simplify boilerplate in
Distribution.Utils.ShortText
and it would be trivial to expose more utilities to avoid ubiquitous conversions toString
. Potentially we can also parse directly intoText
(instead of parsingByteString
and converting later).Note that
Cabal
already depends ontext
viaparsec
, so this move does not change dependency graph.(Another option is to use
text-short
, but this one is slightly less conservative. If the proposal is implemented, it would be trivial to swaptext
andtext-short
, as they provide almost the same API)CC @phadej