python / cpython

The Python programming language
https://www.python.org
Other
63.65k stars 30.49k forks source link

Make python slightly more relocatable #62509

Open 0571e608-369d-48c5-a91e-53101abd5cf2 opened 11 years ago

0571e608-369d-48c5-a91e-53101abd5cf2 commented 11 years ago
BPO 18309
Nosy @ronaldoussoren, @ncoghlan, @ericsnowcurrently, @shakfu
Files
  • python-relative-path-lookup.diff: Proposed change
  • python-relative-path-lookup-v2.diff
  • workaround.c: example of workaround to obtain python home in a relocatable plugin scenario.
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['interpreter-core', '3.9'] title = 'Make python slightly more relocatable' updated_at = user = 'https://bugs.python.org/mathias' ``` bugs.python.org fields: ```python activity = actor = 'vstinner' assignee = 'none' closed = False closed_date = None closer = None components = ['Interpreter Core'] creation = creator = 'mathias' dependencies = [] files = ['30706', '30768', '50677'] hgrepos = [] issue_num = 18309 keywords = ['patch'] message_count = 17.0 messages = ['191909', '191929', '191935', '191944', '191994', '192277', '353243', '353267', '353276', '353288', '353311', '353720', '415142', '415145', '415148', '415172', '415218'] nosy_count = 5.0 nosy_names = ['ronaldoussoren', 'ncoghlan', 'eric.snow', 'mathias', 'shakfu'] pr_nums = [] priority = 'normal' resolution = None stage = 'resolved' status = 'open' superseder = None type = None url = 'https://bugs.python.org/issue18309' versions = ['Python 3.9'] ```

    0571e608-369d-48c5-a91e-53101abd5cf2 commented 11 years ago

    Hi all,

    I want to move python a bit closer to be relocatable. One problem to solve is where python finds its modules. The usual lookup mechanism is to compile in a configure time determined prefix that is used as a last resort path if the paths are not set otherwise during application/interpreter startup. The most commonly known way to change the module path at startup time are probably the environment variables PYTHONPATH and PYTHONHOME. The python interpreter itself already tries to interpret argv[0] to get to this point, but it would be nice if an application embedded interpreter also finds its module path without providing this argv[0] directly to the python library. This should even work if being moved or being installed at a different path than the configure time prefix path.

    The proposal is to add an additional attempt to find the python modules just before we resort to the compiled in prefix by looking at the path to the python27.{so,dll}. Relative to this shared object python library file the python modules are searched in the usual way. If there are no python modules found relative to the python library file, the very last resort compiled in prefix is used as usual.

    For architectures where we cannot determine the path of the shared library file, nothing changes.

    I have attached a patch that tries to implement this. It should serve as a base for discussions. This change is tested on linux and behaves like expected. The windows code for this is copied over from an other project where I have this actively running. But this python code variant is not even compile tested on windows.

    thanks in advance

    Mathias

    ericsnowcurrently commented 11 years ago

    Hi Mathias. There is a current proposal (http://www.python.org/dev/peps/pep-0432/) for improving interpreter startup. So changes in this area are subject to extra caution. The changes you are talking about are at least indirectly impacted by the proposal, though I expect they are more directly tied to what happens in site.py.

    As to your proposal, aren't the embedding needs already addressed? See http://docs.python.org/2/c-api/intro.html#embedding-python. Is there some convention for keeping the site files adjacent to the SO/DLL that would warrant your proposed code?

    p.s. this would be a new feature so it only applies to Python 3.4.

    0571e608-369d-48c5-a91e-53101abd5cf2 commented 11 years ago

    Hi Eric,

    Thanks for looking at that ticket so fast!

    Reassigning this to 3.4 is great.

    In general, yes I can already do what I need more or less. This is the reason why I can be fine with about every python version.

    The point I bring up this change that I believe I am doing this at an unappropriate place as I need to know some internals of python when I do so and that I think that other can probably also benefit from this idea/change. What I currently do is to write an application that just uses python.so as an embedded interpreter and this precompiled application might be relocated to about everywhere - just where it is unpacked. We are currently using the same sort of code to find out where the python\so file is and we use Py_SetPythonHome to set is to the directory where the so file resides.

    Why are we doing this? So, it takes the idea that is currently in the standard python interpreter. This one tries to be relocatable (means: pack the installation directory and unpack that somewhere else and be still able to run) by looking at argv[0] and dereferencing symbolic links until it arrives at a real file. Now suppose you want to embed python, then you do no longer use the standard python interpreter program. You may also use a different installation layout for basic things like bin and lib. So you end up with an application that is no longer able to find its provided python modules by looking at the applications path. But instead of starting from the path of the interpreter (which is not used in this case) or the application itself you could start from the python library path and look for your python installation relative to that. So as long as you stick with the relative file layout of everything that is python related (and only what is python related, the python.so and the modules) when you pack and unpack your precompiled application this would just work.

    So, put that in short: Instead of dynamically finding the the python module path relative to .../bin/python try to find the python relative to .../lib/libpython34.so. The benefit of that would be that every application that embeds python and needs to be relocatable will just work in the way that today only the standard python interpreter works.

    I try to get all of the PEP you pointed me to. As I am seeing this longer document the first time, I am not sure if I missed something there, but in that framework of this my proposal would probably influence the initial setting of

    sys.prefix (?)

    if this is not already provided from the embedding application.

    And yes I am perfectly fine with a different or more general approach. The initially attached patch is something that tried to integrate into the current checked in code as I understood it.

    Greetings

    Mathias

    ncoghlan commented 11 years ago

    The way we figure out where to find the standard library is crazy, and creating the infrastructure to start making it less crazy is actually one of the prime motivations for PEP-432 :)

    ronaldoussoren commented 11 years ago

    Note that the OSX port already does this for framework builds. I don't know why we don't use the same code for shared library builds.

    Issue bpo-15498 contains a patch that switches this code from a deprecated nextstep-era API to dladdr.

    Two comments on the patch attached to this issue:

    1) The name "_PyImport_GetModulePath" is confusing, I'd use _PyImport_GetSharedLibPath to make clear that this is locating the shared library.

    2) The code calls dladdr on a static variable that's introduced just for that, it is also possible to call dladdr on an already existing symbol (for example the address of a function in the public API).

    0571e608-369d-48c5-a91e-53101abd5cf2 commented 11 years ago

    Hi Ronald, Eric, Nick,

    Looking up the symbol name of the current function should work also. And I am free to rename these functions to whatever you like.

    Attached is version 2 of the patch with the suggested changes. The windows implementation is still untested.

    It would be interesting to know if this kind of lookup scheme can be included into PEP-432. Provided the spirit of this PEP, I can imagine to provide several functions to build up the pythonpath starting from something. So say, have a 'get python path from argv[0]', a 'get python path from shared python library' and a 'get python path from prefix' function (I may miss a variant). Also a 'build python path from python home root entry point' function would be useful and could be used by the above functions. An application embedding python can then call those functions that are sensible for its own use and installation scheme to set the module path in the PyConfig struct. The Py_ReadConfig function will internally use the above suggested functions to build up the default configuration if not already provided.

    Greetings

    Mathias

    vstinner commented 5 years ago

    The PEP-587 "Python Initialization Configuration" has been implemented in Python 3.8. It provides fine control on the "Path Configuration":

    0571e608-369d-48c5-a91e-53101abd5cf2 commented 5 years ago

    Hi,

    Nice to see some progress. Still, I checked todays https://github.com/python/cpython.git master and 3.8 branch (is that the current cpython development code?). Neither of them contain a call to dladdr beside the macos code path mentioned in msg191994 by Ronald Oussoren which does this already for a long time. By the lack of dladdr, I conclude that the code idea of my request here is not solved.

    May be to rephrase that. The basic idea behind that request was to make pythons default way to setup the paths required to find the python modules based on the place where the python library resides instead of the python executable program. I do not mean the compile time prefix but the actual location of the shared object in the file system. That would help to build applications that embed cpython, ship and unpack the whole application tree including the python modules to a custom location, while still preserving the subtree structure containing the python shared library and the python modules, not known at compile time. Note that this patch contained code to make that work from within python without custom code in the embedding application. Doing that on the embedding and calling application side was always possible and still is possible - but that was not the point.

    best

    Mathias

    vstinner commented 5 years ago

    Hum, I am confused. I understood that this issue is able customizing sys.path when Python is embedded. But it seems like the feature request is more about the *default* implementation, not how to reimplement it outside Python (with custm code).

    0571e608-369d-48c5-a91e-53101abd5cf2 commented 5 years ago

    Yes.

    msg191944 from Nick Coghlan, made me think that with all the initialization rework that appeared to be underway you want to incorporate that presented idea of basing the default onto the location of the libpython.so or the pythonX.X.dll instead of the location of python/python.exe. And as mentioned by Ronald Oussoren that would even align the methods used across the architectures to something common with a fallback to the current way that takes the path of the interpreter executable. At least that is what the provided patch implemented in the old code structure.

    And this does not even change the default for the common case where the default is plain useful. It is just changing the way how the default is determined so that the default for the case of an embedded interpreter is more meaningful.

    As stated somewhere above. The you can do that with application code when setting up the embedded interpreter, but it would be nice if that just works out of the box and by that helps applications not thinking of that solution.

    best

    Mathias

    vstinner commented 5 years ago

    My plan is not to change the default implementation to calculate the path configuration, but make it easier to customize the path configuration.

    One idea is to rewrite Modules/getpath.c and PC/getpathp.c in Python and convert it to a frozen module. It is easier to modify Python code than C code. In the past, we already did such change for importlib (which also has a frozen part, importlib._bootstrap and importlib._bootstrap_external).

    The PEP-587 implementation moves towards that with the "Multi-Phase Initialization Private Provisional API": https://docs.python.org/dev/c-api/init_config.html#multi-phase-initialization-private-provisional-api

    0571e608-369d-48c5-a91e-53101abd5cf2 commented 5 years ago

    Ok, so far. But what shall I do now? It would be nice that python is a bit smarter in finding its increasing important module files when being embedded into an application. Anybody out there who wants to look at that contribution? best Mathias

    d067530b-fd1a-41da-834f-154dbc3e507c commented 2 years ago

    I have exactly the same need and use-case as Mathias in my project which includes a requirement to embed python3 in a relocatable folder structure w which serves as an application package (https://github.com/shakfu/py-js).

    This can be done using the Framework structure, thanks to Greg Neagle's solution referenced in (https://bugs.python.org/issue42514), but not for any python builds with --enabled-shared.

    In any case, providing options (at the c-level or otherwise, for embedded applications as described by Mathias would be ideal: "I can imagine to provide several functions to build up the pythonpath starting from something. So say, have a 'get python path from argv[0]', a 'get python path from shared python library' and a 'get python path from prefix' function (I may miss a variant)."

    vstinner commented 2 years ago

    In Python 3.11, Modules/getpath.c has been rewritten in Python: Modules/getpath.py. Maybe it's now simpler to hack this file. But you must rebuild Python to take changes in account.

    d067530b-fd1a-41da-834f-154dbc3e507c commented 2 years ago

    Thanks, Victor. I can imagine getpath.py will be more hackable (even if it is frozen).

    Still, it replicates the old algorithm:

    # Before any searches are done, the location of the executable is # determined. If Py_SetPath() was called, or if we are running on # Windows, the 'real_executable' path is used (if known). Otherwise, # we use the config-specified program name or default to argv[0].

    In my case (and I think for Mathias), the executable is a non python application and what is actually dynamically linking to libpythonX.Y.dylib (built via --enable-shared) is a c-based plugin (which calls PyInitialize()), and these two are only aware of their relative locations via the @rpath, @loader_path mechanism.

    Currently, in this scenario, libpythonX.Y.dylib doesn't know here pythonhome is unless explicitly told via PySetPath() or it defaults to the hardcoded sys.prefix .

    If this is to be relocatable, then PySetPath() should be able to handle relative paths.

    0571e608-369d-48c5-a91e-53101abd5cf2 commented 2 years ago

    Hey,

    Shakeeb Alireza is right, the original problem was an application that links and embeds against libpython{so,dll,dynlib} and should more easily find components like //lib/python.*/site.py and most probably now it needs to find getpath.py as well.

    While I am no longer working on that application where I wanted to have that feature, I still believe it would be worthwhile to find the *.py files in the file system relative to the location of the libpython*{so,dll,dynlib} file.

    Thanks for taking care.

    Mathias

    d067530b-fd1a-41da-834f-154dbc3e507c commented 2 years ago

    Thanks, Mathias. This is all about improving python's 'relocatability'.

    Just to expand on my prior point: the scenario we are talking about is where one embeds python in a host application's plugin system rather than in the host application itself.

    In this case, sys.executable is the host application and a relocatable plugin embeds a 'python client'. If a full python distribution is not bundled within this client[*], it needs to (1) link to libpythonX.Y.dylib and (2) get the location of the standard library.

    There are standard cross-platform methods for (1) to be achieved by way of symmetrical @rpath lookups on the client and libpythonX.Y.dylib sides. So this resolvable even in the case when python is compiled with --enabled-shared.

    However, even if (1) is achieved, the client cannot get, programmatically via the python c-api, the location of libpythonX.Y.dylib (even if it is properly dynamically linking to it), because it cannot rely on sys.executable. I think this is crux of Mathias' argument.

    Of course there are workarounds, but they are (at least to me) all platform specific.

    The first and easiest is to just build using the Framework structure and don't ever use --enable-shared, provided you find Greg Neagle's solution (https://bugs.python.org/issue42514)

    Another workaround which is specific to my context (which I have attached), is to use Apple's CoreFoundation library to get the path to the plugin bundle and from there find our way to the python distribution in the containing folder structure (package).

    [*] It is possible to insert a full python distribution into a bundle (in the osx meaning), but then it becomes necessarily frozen or 'sealed' due to Apple's codesigning and notarization requirements, and it basically means that the user cannot extend it with further installations of python packages which contain c-extensions unless they jump through some additional codesigning and notarization hoops.

    arizvisa commented 2 years ago

    Hey,

    Shakeeb Alireza is right, the original problem was an application that links and embeds against libpython{so,dll,dynlib} and should more easily find components like //lib/python.*/site.py and most probably now it needs to find getpath.py as well.

    While I am no longer working on that application where I wanted to have that feature, I still believe it would be worthwhile to find the .py files in the file system relative to the location of the libpython{so,dll,dynlib} file.

    Thanks for taking care.

    Mathias

    Hmm.. I did submit this issue with regards to windows (https://bugs.python.org/issue35173) a super long time ago (as I used to maintain a windbg extension where I encountered problems with regards to relocatability) which seemed to be just a super old regression due to a misplaced condition (from back when python used to have Py_ENABLE_SHARED of which I'm pretty sure that it's been refactored out by now). I was shocked because when I wrote the original patch, it seemed that Python actually used to do this, because all the logic to do it was already there for Windows. Hence my patch turned out to be only a 13 line fix. However, I didn't think there anybody still cared about this since all of the work from the PEP proposal that supposedly refactored this issue out of Python.

    To be clear (since I quit maintaining the windbg extension I mentioned since it required x86 and x64 support), the latest windbg extension to support Python is Pykd and the developer of pykd includes a completely separate extension to specifically to deal with the issue of windbg.exe controlling the path to python and its modules. IDA Pro (a disassembler used for reverse engineering) also includes a separate program called idapyswitch to deal with this situation.

    So although it's nice to have module support for hooking the interpreter to recommend to it on how to find its paths, it does mean you can't just bundle the shared object with its libraries and you absolutely need a whole other installer or distinctly separate loader for dealing with multiple instances of python that have separate packages, etc. Maybe these people that I mentioned are doing it wrong, though.

    (edited) I need to truly verify this, but you can see at https://hg.python.org/cpython/file/default/PC/getpathp.c#l696 (prior to the implementation of the PEP), the comment that reads /* Calculate zip archive path from DLL or exe path */. So it seems that the intent was there before doing the actual registry check, but that ship has long sailed due to the major refactoring that happened in this area.