indygreg / python-build-standalone

Produce redistributable builds of Python
BSD 3-Clause "New" or "Revised" License
1.75k stars 109 forks source link

Private pythonpath in a ._pth file #98

Open Wissperwind opened 2 years ago

Wissperwind commented 2 years ago

Hi, Would it be possible to ship the builds including a private pythonpath defined in a file? The common portable windows python from python.org contains a ._pth file where the pythonpath is defined. So you can install your software with the included python without distracting any other python installation using the system pythonpath. It would be cool to have this option on linux too. If one would like to use the system pythonpath, the ._pth file can easily be deleted.

indygreg commented 2 years ago

I somehow did not know about this <exe>._pth trick on Windows!

Since our distributions are intended to be standalone and independent from the official Python ones, I think it makes sense for us to distribute this file to short-circuit sys.path resolution on Windows (which otherwise may pick up entries from the registry unrelated to our distributions).

Unfortunately, this ._pth hack is only implemented on Windows in CPython. There is a pybuilddir.txt on UNIX. But it is subtly different and I'm not convinced it is appropriate for this project to set.

zooba commented 2 years ago

FYI, Python 3.11 should support ._pth files on all platforms, specifically for this kind of application. It could do with some testing though - the basic premise is solid, but we don't yet have any breadth of experience to e.g. suggest good paths to include.

indygreg commented 2 years ago

FYI, Python 3.11 should support ._pth files on all platforms, specifically for this kind of application. It could do with some testing though - the basic premise is solid, but we don't yet have any breadth of experience to e.g. suggest good paths to include.

Oooh - this feature would significantly improve the value proposition of the distributions built by this project! Thanks for letting me know!

My knee jerk reaction is I'm tempted to backport the feature if it is only a few hours of work. Looking at the 3.11 changelog, I don't see any mention of this feature. Is there any kind of discussion you could point me at? I didn't see anything on python-dev other than PEP 648 discussion, which appears different (but possibly inspired this feature?).

zooba commented 2 years ago

It's never been a widely advertised feature, and the discussions about it were largely in person a few years back where the consensus was "yeah, it'd be nice, but we don't want to mess with getpath.c".

So the change that enabled it was when I rewrote getpath.c entirely into getpath.py (see lines 408 and 705). It was easier to make it cross-platform than restrict it, so it should work everywhere now. I guess I need to find a good place to document it, but we don't really have a suitable section for it other than the What's New page... the best existing place looks like it's in deprecated docs, as we don't really document how the search path is calculated anywhere (because it'd probably be wrong 😆 it's not an easy algorithm to explain). If you spot somewhere you might have seen it, happy to take some pointers.

zooba commented 2 years ago

FYI, https://bugs.python.org/issue31582 is the relevant issue for adding it to the docs. Only five years old :)

indygreg commented 2 years ago

Oh, I'm glad I found out about this functionality sooner than later: the refactoring of path handling has significant implications for PyOxidizer!

Overall I'm very happy to see this refactor! Makes things much easier to understand.

I'll need to set aside some time to grok the new mechanism and see if it has implications for PyOxidizer. But as long as the C API still has full control over path resolution, I think we'll be fine. (I would want PyOxidizer to effectively skip most of getpath.py and have the pyembed Rust crate derive all of that from Rust land in most cases.)

Thanks for all the info, @zooba!

zooba commented 2 years ago

Py_SetPath is still the best way to skip everything, including ._pth handling IIRC. But you can also sub in your own getpath.py entirely if you prefer (only at compile time, though I'd definitely consider a C API to specify an alternate array of bytecode).