haskell / core-libraries-committee

95 stars 15 forks source link

New core library unicode-data-core #278

Open wismill opened 2 weeks ago

wismill commented 2 weeks ago

Currently the Unicode version in base is not upgradable because it depends of the GHC version via the ghc-internal package.

This raises two main issues:

  1. Unicode version is tied to the compiler version.
  2. It makes hard for libraries to ensure consistency: e.g. text has case mappings from Unicode 14.0 but uses also Data.Char, which may have a different Unicode version (base-4.20 uses Unicode 15.1).

The unicode-data (Hackage) package family offers a way to choose an exact Unicode version and access to Unicode features unavailable in base. Some of its core features were merged in base (see #59).

I propose that we go further and decouple the Unicode version from GHC version, by introducing a new core library unicode-data-core that would back Data.Char. Its code would be the one currently in GHC.Internal.Unicode* (probably under another namespace), with the optional addition of the complex case mappings for text and any other basic feature deemed useful for core libraries.

Such package would have low maintenance effort: Unicode publishes versions on a yearly basis and the API is very stable.

However, it is not clear to me if ghc-internal could depend on unicode-data-core as well, as we do not want the compiler to fix the Unicode version.

CC unicode-data team: @adithyaov @Bodigrim @harendra-kumar

Bodigrim commented 2 weeks ago

However, it is not clear to me if ghc-internal could depend on unicode-data-core as well, as we do not want the compiler to fix the Unicode version.

That sounds unlikely to me, ghc-internal is not meant to be reinstallable.

harendra-kumar commented 2 days ago

Having unicode-data-core as a separate package sounds like a good idea to me because unicode-data is too big and might grow with more stuff. But we also need to figure out how GHC can share it so that the maintenance effort is reduced.