haskell / hackage-server

Hackage-Server: A Haskell Package Repository
http://hackage.haskell.org
Other
414 stars 198 forks source link

Contextual (time-dependent) mapping of package names to packages; garbage collection #985

Open andreasabel opened 2 years ago

andreasabel commented 2 years ago

As it stands, if I upload a package X then the package name X will forever point to my package (and its revisions). If I abandon my package and let it die, still the name X will be taken forever. Mathematically, there is no problem with this, since at each point of time the available names is a potential infinity. However, people like "nice" names that make their package easily discoverable, and those nice names should not be taken up forever by dead packages.

In fact, we need not eternally bind a name to a fixed package. (In natural language, words are not bound to the same meaning for all eternity, but the meaning can change over time as human culture develops and objects disappear from every-day life and new objects appear.) Resolution of names to packages could be time-dependent. At each point in time, there is a context that maps package names to packages. If a package is uploaded at time t, then context t is used to resolve the names it mentions. Even if at a later time a name is assigned to a new package, the references of older packages stay intact.
This way we could open the avenue to future garbage collection of dead packages, making their names available for new packages (after some resting time). One could even refer to package whose name has been reassigned by explicitly providing the context, e.g. X@2013 could point to the package that held the name X in 2013. In general, the absolute reference would be X@t where t is a point in time, given to the precision where no ambiguity arises in the resolution of X. But usually, the context can be implicit; there would be no need to change the user interaction in any way.

I think we would not need to store extra information. All the information to reconstruct the context at a certain point in time is already on hackage (i.e. the upload times for the packages).

If it were decided recycle a name, an entry like unlink X could be added to the hackage journal to indicate that the name can be taken again (after some resting time of a couple of years).

Original write-up: https://github.com/haskell/hackage-server/issues/112#issuecomment-950147006

phadej commented 2 years ago

Is X@2013 conceptually different then X >= 2000 && <2013? I.e. if package is completely redone, how starting fresh at some (larger) major version is different? Probably communicating that X@2013 and X@2021 are (completely?) different packages is easier then X-2013 and X-2021, but is there any semantic differences which are not already provided by PVP?

I think we would not need to store extra information. All the information to reconstruct the context at a certain point in time is already on hackage (i.e. the upload times for the packages).

This is not true. Packages don't always come from index. E.g. local packages, direct tarball links, and source-repository-package etc. The package metadata should be self-contained in the .cabal file. The index is just an index, accumulating the information, but not really providing any extra information for the install plan solver.

andreasabel commented 2 years ago

Is X@2013 conceptually different then X >= 2000 && <2013? I.e. if package is completely redone, how starting fresh at some (larger) major version is different? Probably communicating that X@2013 and X@2021 are (completely?) different packages is easier then X-2013 and X-2021, but is there any semantic differences which are not already provided by PVP?

Mathematically, I think you can do all this inside the PVP, provided every package declares upper bounds on all its dependencies. But if upper bounds are absent and the newest "versions" are chosen, you might get the wrong product.

From a more intuitive perspective, I think it would be weird for a new package to start with a certain version number only because there once upon a time there existed a package with the same name that occupied a certain version range. Also, simply starting with a higher version number suggest a continuity of purpose. I think it would be weird if pegasus < 5 was a mail client and pegasus >= 5 a workflow management system and both would exist in the same context.