pypa / packaging.python.org

Python Packaging User Guide
http://packaging.python.org
1.44k stars 928 forks source link

pkg_util leaves broken packages on uninstall #314

Open jaraco opened 7 years ago

jaraco commented 7 years ago

I've just encountered another issue related to pkg_util-based namespace packages on pre-PEP 420 packages. Uninstallation of one package in a namespace breaks the package for all other in that namespace. Consider:

$ virtualenv --python python2.7 env
Running virtualenv with interpreter /usr/local/bin/python2.7
New python executable in /Users/jaraco/env/bin/python2.7
Also creating executable in /Users/jaraco/env/bin/python
Installing setuptools, pip, wheel...done.
$ env/bin/pip install -q backports.unittest_mock backports.datetime_timestamp
$ env/bin/pip uninstall -y backports.datetime_timestamp                  
Uninstalling backports.datetime-timestamp-1.1:
  Successfully uninstalled backports.datetime-timestamp-1.1
$ env/bin/python -c "import backports.unittest_mock"                      
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: No module named backports.unittest_mock

The issue occurs because the uninstallation step causes the removal of the now necessary __init__.py module for the backports package. The issue doesn't occur on late releases of Python 3.3 because the package gets implicitly converted to a PEP 420 namespace package (during the uninstallation step), which in some ways is worse, because the backports package will vacillate between PEP-420 and pkg_util-managed depending on whether the last operation was an install or uninstall.

In some sense, this issue is an issue with pip and the way it installs packages all into the same location, rather than keeping distribution packages somehow separated the way eggs sought to do, but it's also a symptom of using only pkg_util for namespace package management, an issue that was addressed by setuptools' installation of the -nspkg.pth files.

theacodes commented 7 years ago

That is unfortunate behavior. What should we do here to resolve this?

jaraco commented 7 years ago

I think the solution here is pretty simple:

  1. Continue to recommend pkg_util as the Python 2 compatible mechanism for namespace package modules.
  2. Remove the setuptools requirement that requires declared namespaces to use the pkg_resources technique.
  3. Recommend that packages once again declare their namespace packages in setup.py such that setuptools will omit the __init__.py and install a -nspkg.pth file, which aligns the installed packages more closely with the PEP 420 design, but also provides independence of the technique when pip installed.
jaraco commented 7 years ago

The downside to the above approach is that we continue to be entrenched in the issues surrounding package discovery (open issue with find_packages plus redundancy between defining packages and namespace_packages).

jaraco commented 7 years ago

Another possibility is that we could work on enhancing pip to avoid removing init.py files containing pkg_util declarations unless that's the only file remaining.

jaraco commented 7 years ago

Now that I think about it a little bit more, this first suggestion is probably subject to all of the same issues that dissuaded us from using the PKG_resources technique.

theacodes commented 7 years ago

Yeah, I wonder what @pfmoore thinks about the pip suggestion.

On Sat, May 20, 2017, 9:50 AM Jason R. Coombs notifications@github.com wrote:

Now that I think about it a little bit more, this first suggestion is probably subject to all of the same issues that dissuaded us from using the PKG_resources technique.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/pypa/python-packaging-user-guide/issues/314#issuecomment-302884947, or mute the thread https://github.com/notifications/unsubscribe-auth/AAPUc2Dy6FsSVlonj5BC36PlugORtdOBks5r7xnWgaJpZM4NhWBI .

dstufft commented 7 years ago

Fixing pip seems like a good idea.

Sent from my iPhone

On May 20, 2017, at 12:53 PM, Jon Wayne Parrott notifications@github.com wrote:

Yeah, I wonder what @pfmoore thinks about the pip suggestion.

On Sat, May 20, 2017, 9:50 AM Jason R. Coombs notifications@github.com wrote:

Now that I think about it a little bit more, this first suggestion is probably subject to all of the same issues that dissuaded us from using the PKG_resources technique.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/pypa/python-packaging-user-guide/issues/314#issuecomment-302884947, or mute the thread https://github.com/notifications/unsubscribe-auth/AAPUc2Dy6FsSVlonj5BC36PlugORtdOBks5r7xnWgaJpZM4NhWBI .

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

theacodes commented 7 years ago

See https://github.com/pypa/pip/pull/4504

ncoghlan commented 7 years ago

The distro level solution to this is to recommend that namespaces always have a clear owner (synthesizing one at the distro level if it doesn't already exist upstream), and then the contributors to the namespace all depend on the owning package.

That way contributing packages can be freely installed and uninstalled, while if you uninstall the owning package, that will show up as a missing dependency for all of the contributing packages.

Fixing that advice in the guide would be a matter of changing the following paragraph to recommend the "owning package"/"contributing package that depends on the owning package" split:

Every distribution that uses the namespace package must include an identical init.py. If any distribution does not, it will cause the namespace logic to fail and the other sub-packages will not be importable. Any additional code in init.py will be inaccessible.

And this one to mention that instead of generating an __init__.py file, setuptools will generate <name>-nspkg.pth entries that generate appropriate sys.modules entries at Python startup:

Every distribution that uses the namespace package must include an identical init.py. If any distribution does not, it will cause the namespace logic to fail and the other sub-packages will not be importable. Any additional code in init.py will be inaccessible.

We'd then add a section at the end about "Namespaces without an owning package" that says:

theacodes commented 7 years ago

@ncoghlan thanks for that - I definitely agree it's a good idea to recommend an "owning" package for each namespace that documents which method to use and the policies around it. I'll add that to my work items.

However, I think pkgutil style namespace packages effectively require having an owning package, as otherwise the __init__.py file will be removed the first time any contributing package is uninstalled isn't quite right. The test I wrote for pip over at https://github.com/pypa/pip/pull/4504 seems to indicate that even with an owning package, uninstalling any contributing package will break it.

ncoghlan commented 7 years ago

@jonparrott When there's an owning package that all the contributing packages depend on, then the contributing packages should not need to include the __init__.py file for the namespace. If they do, then you get the undesirable "first uninstall breaks the namespace" behaviour.

That's how the distro level approach works - the owning package is the only one that includes the __init__.py file, and the package build process omits it from all of the other contributing packages.

dstufft commented 7 years ago

Due to hysterical raisins, I think pip + uninstalling namespace packages has always been somewhat painful. If you install a project with setuptools (WITHOUT pip), the only record of what files have been installed is top_level.txt, which list the top level packages, so IIRC we just blow away the entire top level if it's listed in top_level.txt.

We probably need to refactor that, and possibly either deprecate/remove uninstalling setuptools/easy_install installed packages OR get it so setuptools drops a record file by default.

theacodes commented 7 years ago

@ncoghlan ah, I see now. Hmmm. I think this might make things complicated for pip install -e as the user would need the owning package installed for that.

After reading @dstufft's reply, I'm unsure what to do here. I don't know if the guide should change it's recommendations at this point for namespaces, considering "fixing" the "bug" in pip should make recommending owning packages irrelevant. However, suggesting owning packages might be useful in some cases but difficult in others.

A compromise could be to add a new section about owning packages at the end of the guide, so that we can document the benefits and drawbacks and allow users to decide. This sort of thing is totally acceptable for this kind of doc.

ncoghlan commented 7 years ago

I'm fine with starting with an appended section, while leaving the main recommendations alone for now.

I'm not sure about the pip -e concern, though - editable installs always need their dependencies installed, and in this case, the owning package becomes just another dependency. The one case I'm aware of that does become more complicated is when the contributing package otherwise has no dependencies.

theacodes commented 7 years ago

I ran into some trouble trying to get owning packages to work for everything. Installing a pkg_resources owning package broke my Python 3 environment in a spectacular fashion:

Failed to import the site module
Traceback (most recent call last):
  File "/Users/jonwayne/workspace/sample-namespace-packages/env/lib/python3.6/site.py", line 703, in <module>
    main()
  File "/Users/jonwayne/workspace/sample-namespace-packages/env/lib/python3.6/site.py", line 683, in main
    paths_in_sys = addsitepackages(paths_in_sys)
  File "/Users/jonwayne/workspace/sample-namespace-packages/env/lib/python3.6/site.py", line 282, in addsitepackages
    addsitedir(sitedir, known_paths)
  File "/Users/jonwayne/workspace/sample-namespace-packages/env/lib/python3.6/site.py", line 204, in addsitedir
    addpackage(sitedir, name, known_paths)
  File "/Users/jonwayne/workspace/sample-namespace-packages/env/lib/python3.6/site.py", line 173, in addpackage
    exec(line)
  File "<string>", line 1, in <module>
  File "<frozen importlib._bootstrap>", line 557, in module_from_spec
AttributeError: 'NoneType' object has no attribute 'loader'

I'm going to put this lower on my priority for now. It seems to me that the issue with the current recommendation (uninstalling) is less severe than the potential drawbacks of owning packages (broken environment). Happy to keep this open until either pip is fixed or something else moves here.