pypa / packaging-problems

An issue tracker for the problems in packaging
144 stars 33 forks source link

Package names conflicts : why not optional separate namespaces (e.g.: channels on anaconda) ? #114

Open Didou09 opened 6 years ago

Didou09 commented 6 years ago

As the number of available pip packages is increasing, I feel that there migh be a growing risk of name conflicts between different packages. Would it be possible to include optional separate namespaces on Pypi ? That could at least partially solve the issue (or tame I am thinking about examples such as the channels on Anaconda, i.e.: an optional argument that could be passed to pip install. Such a solution might also encourage grouping packages by topic and thus make searches more intuitive.

Has that ever been considered ? If so which arguments led to turning it down ? Any plans for allowing more flexibility regarding packages names ?

I know there is a PEP on package naming, I feel like a channel-like solution would be particularly adapted in cases like:

ncoghlan commented 6 years ago

We don't offer default namespaces on PyPI for the same reason that Python itself doesn't require them on package imports: giving too much weight to the risk of future naming conflicts leads to folks exposing irrelevant information (like specific company names) as part of their public software installation API.

Good component names will ideally reflect what the software is for (since that's what end users care about), rather than who wrote it (which is mainly only interesting to folks checking software provenance for security and supply chain sustainability management purposes).

That said, we actually do offer a number of ways to manipulate the distribution package namespace:

  1. Namespace packages
  2. Packages where the distribution package name differs from the import package name
  3. Running separate index servers (i.e. using distinct URLs to specify distinct distribution channels)

Namespace packages allow a shared Python import package to be broken up into multiple independently published distribution packages. See https://pypi.org/search/?q=backports for some examples using the "backports" namespace. While this could be more explicitly supported in the Warehouse UI (e.g. by assuming that the "prefix.suffix" naming scheme always indicates the use of a namespace package), it is otherwise already supported by existing package management tools.

Beyond namespace packages, we also allow an M:N mapping between distribution package names (the name you pass to a command like "pip install") and import package names (the name you use in a Python "import" statement). As an example, this is what allowed the "pillow" fork to supplant the original "PIL" project - while you "pip install pillow", that change is entirely transparent to your code, which still does "from PIL import Image".

This also allows folks to add a prefix to the distribution package to avoid a name conflict on PyPI without having to change their import package name. For example, if there were a Linux specific project called theproject that wanted to start publishing their Python bindings to PyPI, but that conflicted with a previously published PyPI project, they could use a distribution name like linux-theproject, without changing their import name at all.

Here, the missing UX piece is that we don't make the information on which import packages a distribution package provides readily available through the web UI, which means you can't search by import package name either. That's an inherent technical limitation for projects that only upload source archives, but a fixable problem for projects that upload prebuilt wheel archives (it just requires someone with sufficient time and the inclination to fix it).

Finally, https://www.python.org/dev/peps/pep-0503/ defines the API that installation clients use to install components, and this then ties into client support for choosing which index server(s) they actually want to use: https://pip.pypa.io/en/stable/reference/pip_install/#cmdoption-index-url

Based on that, Warehouse is likely to eventually gain support for at least a separate staging index (see https://github.com/pypa/warehouse/issues/720 ). The same approach could also eventually be used to offer user and org specific indices, but it's not clear yet whether there'd be enough benefit from that to make it worthwhile (see https://github.com/pypa/warehouse/issues/2286#issuecomment-348047920 for one potentially compelling use case)