haskell / cabal

Official upstream development repository for Cabal and cabal-install
https://haskell.org/cabal
Other
1.62k stars 691 forks source link

Document the relationship between ComponentId, UnitId, MungedPackageId, PackageId etc. #5809

Open phadej opened 5 years ago

phadej commented 5 years ago

There are these types: ComponentId, UnitId, MungedPackageId, PackageId etc. which all seems to be same same but different. There should be a technical note in the code describing their relationship, with


Honestly, I have no idea what are the subtle differences about these types, and I won't read through GitHub issues to find out. The note in the code would be the right place to refer.


As an example, ComponentId is documented as

For non-Backpack components, this corresponds one to one with the 'UnitId', which serves as the basis for install paths, linker symbols, etc.

And there are no remarks how things are different in Backpack case.


Related issue https://github.com/haskell/cabal/issues/4761

There is a Mapping from semantic objects in this thesis (Fig. 3.5 and Fig. 6.1) to their definitions in GHC table in Edwards thesis. Something similar but for Cabal types would be great to have too.

phadej commented 5 years ago

FWIW, if someone points out what is the missing documentation for parsec parser implementation, I'd be glad to answer those questions in a note too.

ezyang commented 5 years ago

https://ghc.haskell.org/trac/ghc/wiki/Commentary/Packages/Concepts are the docs you are looking for.

phadej commented 5 years ago

@ezyang that wiki page doesn't mention MungedPackageId

ezyang commented 5 years ago

I think you want this comment in MungedPackageName:

-- | Computes the package name for a library.  If this is the public
-- library, it will just be the original package name; otherwise,
-- it will be a munged package name recording the original package
-- name as well as the name of the internal library.
--
-- A lot of tooling in the Haskell ecosystem assumes that if something
-- is installed to the package database with the package name 'foo',
-- then it actually is an entry for the (only public) library in package
-- 'foo'.  With internal packages, this is not necessarily true:
-- a public library as well as arbitrarily many internal libraries may
-- come from the same package.  To prevent tools from getting confused
-- in this case, the package name of these internal libraries is munged
-- so that they do not conflict the public library proper.  A particular
-- case where this matters is ghc-pkg: if we don't munge the package
-- name, the inplace registration will OVERRIDE a different internal
-- library.
--
-- We munge into a reserved namespace, "z-", and encode both the
-- component name and the package name of an internal library using the
-- following format:
--
--      compat-pkg-name ::= "z-" package-name "-z-" library-name
--
-- where package-name and library-name have "-" ( "z" + ) "-"
-- segments encoded by adding an extra "z".
--
-- When we have the public library, the compat-pkg-name is just the
-- package-name, no surprises there!
phadej commented 5 years ago

But that's not true, with package environment containing haddock-library-1.6.0

λ> :show packages
active package flags:
  -package-id transformers-0.5.5.0
  -package-id containers-0.5.11.0
  -package-id array-0.5.2.0
  -package-id deepseq-1.4.3.0
  -package-id bytestring-0.10.8.2
  -package-id haddock-library-1.6.0-455b3b98c686fb127ac7d8c6fdd26ff2d06393b02f0e9642988bdc7e5a70fffc
  -package-id haddock-library-1.6.0-f24a0b3744bcfc55f2e3f40d91d735f5c9082e7cc5db16397cbfa2eb9edf9868
  -package-id integer-gmp-1.0.2.0
  -package-id ghc-prim-0.5.2.0
  -package-id rts
  -package-id base-4.11.1.0

should other haddock-library-1.6.0 be z-haddock-library; something doesn't match.

Did multiple public libraries patch changed something?

Or does something use the wrong type?

ezyang commented 5 years ago

Yeah, that doesn't look too good. Maybe something is broken. Will have to look later.

phadej commented 5 years ago

... though ghc-pkg --package-db ... list shows

z-haddock-library-z-attoparsec-1.6.0

Does :show packages shows unit-id's which is yet different from MungedPackageId? but it's still named -package-id. I'm confused. I need a table. It doesn't need to be complete from the beginning, but it can be updated as people (atm me) ask questions.

phadej commented 5 years ago

Why we didn't teach ghc-pkg to differentiate between "public" and "internal" libs, i.e. is MungedPackageId a technical debt as ghc-pkg doesn't know about components? is it impossible to change, is the change the right thing to do, but nobody have time to do it?

ezyang commented 5 years ago

To your first comment: That would make sense. But I was under the impression that unit IDs included component names, and that's not what I see above. So it may also be a rendering bug on GHC's part.

The right thing to do is teach ghc-pkg to know about components, but since Cabal supports older versions of ghc-pkg, it still needs to do the MungedPackageId workaround.

phadej commented 5 years ago

@ezyang so, is MungedPackageId a way to name library (public and internal ones) components (i.e. ComponentId for library components) outside the Cabal? Are those semantically equivalent?

phadej commented 5 years ago

@ezyang Can we convert MungedPackageName representation to actually be data MungedPackageName = MungedPackageName PackageName (Maybe UnqualComponentName); and push zdashcode stuff into Pretty / Parsec instances. That way the structure of the type would be closer to its semantics. And string representation "implementation detail" could be pushed into Pretty / Parsec instances.

ezyang commented 5 years ago

That sounds good to me!

phadej commented 5 years ago

Good. As I merged the patch yesterday :)

Sent from my iPhone

On 7 Jan 2019, at 16.41, Edward Z. Yang notifications@github.com wrote:

That sounds good to me!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

alexbiehl commented 5 years ago
-- A lot of tooling in the Haskell ecosystem assumes that if something
-- is installed to the package database with the package name 'foo',
-- then it actually is an entry for the (only public) library in package
-- 'foo'.  With internal packages, this is not necessarily true:
-- a public library as well as arbitrarily many internal libraries may
-- come from the same package.

Aaaand I think there is more: In Backpack we talk about instantiations of packages. I am speaking about the case where there is a Backpack package foo instantiated with implementation bar and the same foo instantiated with implementation baz. Technically they are different packages under the same name.

My current hunch is that we need to extend our MungedPackageName encoding to include instantiations as well. Haven't come around to bake a small reproducer yet.

Mikolaj commented 3 years ago

Related: https://gitlab.haskell.org/ghc/ghc/-/wikis/commentary/compiler/units