Make pyvenv style virtual environments easier to configure when embedding Python

python / cpython

The Python programming language

https://www.python.org

Other

62.74k stars 30.07k forks source link

Make pyvenv style virtual environments easier to configure when embedding Python #66409

Open fe491b48-23c2-4033-aaa2-1a6613895466 opened 10 years ago

fe491b48-23c2-4033-aaa2-1a6613895466 commented 10 years ago

BPO	22213
Nosy	@ncoghlan, @pitrou, @vstinner, @methane, @ericsnowcurrently, @zooba, @ndjensen, @LeslieGerman, @M-Kerr, @abrunner73
Dependencies	bpo-22257: PEP 432 (PEP 587): Redesign the interpreter startup sequence

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['type-feature', '3.8'] title = 'Make pyvenv style virtual environments easier to configure when embedding Python' updated_at = user = 'https://bugs.python.org/grahamd' ``` bugs.python.org fields: ```python activity = actor = 'ndjensen' assignee = 'none' closed = False closed_date = None closer = None components = [] creation = creator = 'grahamd' dependencies = ['22257'] files = [] hgrepos = [] issue_num = 22213 keywords = [] message_count = 31.0 messages = ['225434', '225436', '225437', '225739', '225742', '225771', '225774', '225890', '334926', '334948', '335015', '335468', '335470', '335479', '335484', '335648', '335650', '335688', '335692', '335749', '336793', '343636', '352905', '354856', '354857', '354858', '361600', '361869', '362260', '366570', '384496'] nosy_count = 13.0 nosy_names = ['ncoghlan', 'pitrou', 'vstinner', 'pyscripter', 'grahamd', 'methane', 'eric.snow', 'steve.dower', 'Henning.von.Bargen', 'ndjensen', 'Leslie', 'M.Kerr', 'abrunner73'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue22213' versions = ['Python 3.8'] ```

fe491b48-23c2-4033-aaa2-1a6613895466 commented 10 years ago

In am embedded system, as the 'python' executable is itself not run and the Python interpreter is initialised in process explicitly using PyInitialize(), in order to find the location of the Python installation, an elaborate sequence of checks is run as implemented in calculate_path() of Modules/getpath.c.

The primary mechanism is usually to search for a 'python' executable on PATH and use that as a starting point. From that it then back tracks up the file system from the bin directory to arrive at what would be the perceived equivalent of PYTHONHOME. The lib/pythonX.Y directory under that for the matching version X.Y of Python being initialised would then be used.

Problems can often occur with the way this search is done though.

For example, if someone is not using the system Python installation but has installed a different version of Python under /usr/local. At run time, the correct Python shared library would be getting loaded from /usr/local/lib, but because the 'python' executable is found from /usr/bin, it uses /usr as sys.prefix instead of /usr/local.

This can cause two distinct problems.

The first is that there is no Python installation at all under /usr corresponding to the Python version which was embedded, with the result of it not being able to import 'site' module and therefore failing.

The second is that there is a Python installation of the same major/minor but potentially a different patch revision, or compiled with different binary API flags or different Unicode character width. The Python interpreter in this case may well be able to start up, but the mismatch in the Python modules or extension modules and the core Python library that was actually linked can cause odd errors or crashes to occur.

Anyway, that is the background.

For an embedded system the way this problem was overcome was for it to use Py_SetPythonHome() to forcibly override what should be used for PYTHONHOME so that the correct installation was found and used at runtime.

Now this would work quite happily even for Python virtual environments constructed using 'virtualenv' allowing the embedded system to be run in that separate virtual environment distinct from the main Python installation it was created from.

Although this works for Python virtual environments created using 'virtualenv', it doesn't work if the virtual environment was created using pyvenv.

One can easily illustrate the problem without even using an embedded system.

$ which python3.4
/Library/Frameworks/Python.framework/Versions/3.4/bin/python3.4

$ pyvenv-3.4 py34-pyvenv

$ py34-pyvenv/bin/python
Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 00:54:21)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.prefix
'/private/tmp/py34-pyvenv'
>>> sys.path
['', '/Library/Frameworks/Python.framework/Versions/3.4/lib/python34.zip', '/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4', '/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/plat-darwin', '/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/lib-dynload', '/private/tmp/py34-pyvenv/lib/python3.4/site-packages']

$ PYTHONHOME=/tmp/py34-pyvenv python3.4
Fatal Python error: Py_Initialize: unable to load the file system codec
ImportError: No module named 'encodings'
Abort trap: 6

The basic problem is that in a pyvenv virtual environment, there is no duplication of stuff in lib/pythonX.Y, with the only thing in there being the site-packages directory.

When you start up the 'python' executable direct from the pyvenv virtual environment, the startup sequence checks know this and consult the pyvenv.cfg to extract the:

home = /Library/Frameworks/Python.framework/Versions/3.4/bin

setting and from that derive where the actual run time files are.

When PYTHONHOME or Py_SetPythonHome() is used, then the getpath.c checks blindly believe that is the authoritative value:

Step 2. See if the $PYTHONHOME environment variable points to the
installed location of the Python libraries. If $PYTHONHOME is set, then
it points to prefix and exec_prefix. $PYTHONHOME can be a single
directory, which is used for both, or the prefix and exec_prefix
directories separated by a colon.

    /* If PYTHONHOME is set, we believe it unconditionally */
    if (home) {
        wchar_t *delim;
        wcsncpy(prefix, home, MAXPATHLEN);
        prefix[MAXPATHLEN] = L'\0';
        delim = wcschr(prefix, DELIM);
        if (delim)
            *delim = L'\0';
        joinpath(prefix, lib_python);
        joinpath(prefix, LANDMARK);
        return 1;
    }
Because of this, the problem above occurs as the proper runtime directories for files aren't included in sys.path. The result being that the 'encodings' module cannot even be found.

What I believe should occur is that PYTHONHOME should not be believed unconditionally. Instead there should be a check to see if that directory contains a pyvenv.cfg file and if there is one, realise it is a pyvenv style virtual environment and do the same sort of adjustments which would be made based on looking at what that pyvenv.cfg file contains.

For the record this issue is affecting Apache/mod_wsgi and right now the only workaround I have is to tell people that in addition to setting the configuration setting corresponding to PYTHONHOME, to use configuration settings to have the same effect as doing:

PYTHONPATH=/Library/Frameworks/Python.framework/Versions/3.4/lib/python34.zip:/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4:/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/plat-darwin:/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/lib-dynload

so that the correct runtime files are found.

I am still trying to work out a more permanent workaround I can add to mod_wsgi code itself since can't rely on a fix for existing Python versions with pyvenv support.

Only other option is to tell people not to use pyvenv and use virtualenv instead.

Right now I can offer no actual patch as that getpath.c code is scary enough that not even sure at this point where the check should be incorporated or how.

Only thing I can surmise is that the current check for pyvenv.cfg being before the search for the prefix is meaning that it isn't consulted.

/* Search for an environment configuration file, first in the
   executable's directory and then in the parent directory.
   If found, open it for use when searching for prefixes.
*/

{
    wchar_t tmpbuffer[MAXPATHLEN+1];
    wchar_t *env_cfg = L"pyvenv.cfg";
    FILE * env_file = NULL;

    wcscpy(tmpbuffer, argv0_path);

        joinpath(tmpbuffer, env_cfg);
        env_file = _Py_wfopen(tmpbuffer, L"r");
        if (env_file == NULL) {
            errno = 0;
            reduce(tmpbuffer);
            reduce(tmpbuffer);
            joinpath(tmpbuffer, env_cfg);
            env_file = _Py_wfopen(tmpbuffer, L"r");
            if (env_file == NULL) {
                errno = 0;
            }
        }
        if (env_file != NULL) {
            /* Look for a 'home' variable and set argv0_path to it, if found */
            if (find_env_config_value(env_file, L"home", tmpbuffer)) {
                wcscpy(argv0_path, tmpbuffer);
            }
            fclose(env_file);
            env_file = NULL;
        }
    }

    pfound = search_for_prefix(argv0_path, home, _prefix, lib_python);

ncoghlan commented 10 years ago

Yeah, PEP-432 (my proposal to redesign the startup sequence) could just as well be subtitled "getpath.c hurts my brain" :P

One tricky part here is going to be figuring out how to test this - perhaps adding a new test option to _testembed and then running it both inside and outside a venv.

ncoghlan commented 10 years ago

Graham pointed out that setting PYTHONHOME ends up triggering the same control flow through getpath.c as calling Py_SetPythonHome, so this can be tested just with pyvenv and a suitably configured environment.

It may still be a little tricky though, since we normally run the pyvenv tests in isolated mode to avoid spurious failures due to bad environment settings...

ncoghlan commented 10 years ago

Some more experiments, comparing an installed vs uninstalled Python. One failure mode is that setting PYTHONHOME just plain breaks running from a source checkout (setting PYTHONHOME to the checkout directory also fails):

$ ./python -m venv --without-pip /tmp/issue22213-py35

$ /tmp/issue22213-py35/bin/python -c "import sys; print(sys.base_prefix, sys.base_exec_prefix)"
/usr/local /usr/local

$ PYTHONHOME=/usr/local /tmp/issue22213-py35/bin/python -c "import sys; print(sys.base_prefix, sys.base_exec_prefix)"
Fatal Python error: Py_Initialize: Unable to get the locale encoding
ImportError: No module named 'encodings'
Aborted (core dumped)

Trying after running "make altinstall" (which I had previously done for 3.4) is a bit more enlightening:

$ python3.4 -m venv --without-pip /tmp/issue22213-py34

$ /tmp/issue22213-py34/bin/python -c "import sys; print(sys.base_prefix, sys.base_exec_prefix)"
/usr/local /usr/local

$ PYTHONHOME=/usr/local /tmp/issue22213-py34/bin/python -c "import sys; print(sys.base_prefix, sys.base_exec_prefix)"
/usr/local /usr/local

$ PYTHONHOME=/tmp/issue22213-py34 /tmp/issue22213-py34/bin/python -c "import sys; print(sys.base_prefix, sys.base_exec_prefix)"
Fatal Python error: Py_Initialize: Unable to get the locale encoding
ImportError: No module named 'encodings'
Aborted (core dumped)

$ PYTHONHOME=/tmp/issue22213-py34:/usr/local /tmp/issue22213-py34/bin/python -c "import sys; print(sys.base_prefix, sys.base_exec_prefix)"
Fatal Python error: Py_Initialize: Unable to get the locale encoding
ImportError: No module named 'encodings'
Aborted (core dumped)
[ncoghlan@lancre py34]$ PYTHONHOME=/usr/local:/tmp/issue22213-py34/bin /tmp/issue22213-py34/bin/python -c "import sys; print(sys.base_prefix, sys.base_exec_prefix)"
/usr/local /tmp/issue22213-py34/bin

I think what this is actually showing is that there's a fundamental conflict between mod_wsgi's expectation of being able to set PYTHONHOME to point to the virtual environment, and the way PEP-405 virtual environments actually work.

With PEP-405, all the operations in getpath.c expect to execute while pointing to the *base* environment: where the standard library lives. It is then up to site.py to later adjust the based prefix location, as can be demonstrated by the fact pyvenv.cfg isn't processed if processing the site module is disabled:

$ /tmp/issue22213-py34/bin/python -c "import sys; print(sys.prefix, sys.exec_prefix)"
/tmp/issue22213-py34 /tmp/issue22213-py34
$ /tmp/issue22213-py34/bin/python -S -c "import sys; print(sys.prefix, sys.exec_prefix)"
/usr/local /usr/local

At this point in time, there isn't an easy way for an embedding application to say "here's the standard library, here's the virtual environment with user packages" - it's necessary to just override the path calculations entirely.

Allowing that kind of more granular configuration is one of the design goals of PEP-432, so adding that as a dependency here.

fe491b48-23c2-4033-aaa2-1a6613895466 commented 10 years ago

It is actually very easy for me to work around and I released a new mod_wsgi version today which works.

When I get a Python home option, instead of calling Py_SetPythonHome() with it, I append '/bin/python' to it and call Py_SetProgramName() instead.

ncoghlan commented 10 years ago

Excellent! If I recall correctly, that works because we resolve the symlink when looking for the standard library, but not when looking for venv configuration file.

I also suspect this is all thoroughly broken on Windows - there are so many configuration operations and platform specific considerations that need to be accounted for in getpath.c these days that it has become close to incomprehensible :(

One of my main goals with PEP-432 is actually to make it possible to rewrite the path configuration code in a more maintainable way - my unofficial subtitle for that PEP is "getpath.c must die!" :)

fe491b48-23c2-4033-aaa2-1a6613895466 commented 10 years ago

I only make the change to Py_SetProgramName() on UNIX and not Windows. This is because back in mod_wsgi 1.0 I did actually used to use Py_SetProgramName() but it didn't seem to work in sane way on Windows so changed to Py_SetPythonHome() which worked on both Windows and UNIX. Latest versions of mod_wsgi haven't been updated yet to even build on Windows, so not caring about Windows right now.

pitrou commented 10 years ago

That workaround would definitely deserve being wrapped in a higher-level API invokable by embedding applications, IMHO.

ncoghlan commented 5 years ago

(Added Victor, Eric, and Steve to the nosy list here, as I'd actually forgotten about this until issue bpo-35706 reminded me)

Core of the problem: the embedding APIs don't currently offer a Windows-compatible way of setting up "use this base Python and this venv site-packages", and the way of getting it to work on other platforms is pretty obscure.

zooba commented 5 years ago

Victor may be thinking about it from time to time (or perhaps it's time to make the rest of the configuration changes plans concrete so we can all help out?), but I'd like to see this as either:

a helper function to fill out the core config structure from a pyvenv.cfg file (rather than hiding it deeper as it currently is), or better yet,
remove the dependency on all non-frozen imports at initialization and let embedders define Python code to do the initialization

In the latter case, the main python.exe also gets to define its behavior. So for the most part, we should be able to remove getpath[p].c and move it into the site module, then make that our Python initialization step.

This would also mean that if you are embedding Python but not allowing imports (e.g. as only a calculation engine), you don't have to do the dance of _denying_ all lookups, you simply don't initialize them.

But as far as I know, we don't have a concrete vision for "how will consumers embed Python in their apps" that can translate into work - we're still all individually pulling in slightly different directions. Sorting that out is most important - having someone willing to do the customer engagement work to define an actual set of requirements and desirables would be fantastic.

ncoghlan commented 5 years ago

Yeah, I mainly cc'ed Victor and Eric since making this easier ties into one of the original design goals for PEP-432 (even though I haven't managed to persuade either of them to become co-authors of that PEP yet).

vstinner commented 5 years ago

PEP-432 will allow to give with fine control on parameters used to initialize Python. Sadly, I failed to agree with Nick Coghlan and Eric Snow on the API. The current implementation (_PyCoreConfig and _PyMainInterpreterConfig) has some flaw (don't separate clearly the early initialization and Unicode-ready state, the interpreter contains main and core config whereas some options are duplicated in both configs, etc.).

See also bpo-35706.

zooba commented 5 years ago

I just closed 35706 as a duplicate of this one (the titles are basically identical, which feels like a good hint ;) )

It seems that the disagreement about the design is fundamentally a disagreement between a "quick, painful but complete fix" and "slow, careful improvements with a transition period". Both are valid approaches, and since Victor is putting actual effort in right now he gets to "win", but I do think we can afford to move faster.

It seems the main people who will suffer from the pain here are embedders (who are already suffering pain) and the core developers (who explicitly signed up for pain!). But without knowing the end goal, we can't accelerate.

Currently PEP-432 is the best description we have, and it looks like Victor has been heading in that direction too (deliberately? I don't know :) ). But it seems like a good time to review it, replace the "here's the current state of things" with "here's an imaginary ideal state of things" and fill the rest with "here are the steps to get there without breaking the world".

By necessity, it touches a lot of people's contributions to Python, but it also has the potential to seriously improve even more people's ability to _use Python (for example, I know an app that you all would recognize the name of who is working on embedding Python right now and would _love certain parts of this side of things to be improved).

Nick - has the steering council been thinking about ways to promote collaborative development of ideas like this? I'm thinking an Etherpad style environment for the brainstorm period (in lieu of an in-person whiteboard session) that's easy for us all to add our concerns to, that can then be turned into something more formal.

Nick, Victor, Eric, (others?) - are you interested in having a virtual whiteboard session to brainstorm how the "perfect" initialization looks? And probably a follow-up to brainstorm how to get there without breaking the world? I don't think we're going to get to be in the same room anytime before the language summit, and it would be awesome to have something concrete to discuss there.

vstinner commented 5 years ago

It seems that the disagreement about the design is fundamentally a disagreement between a "quick, painful but complete fix" and "slow, careful improvements with a transition period". Both are valid approaches, and since Victor is putting actual effort in right now he gets to "win", but I do think we can afford to move faster.

Technically, the API already exists and is exposed as a private API:

"_PyCoreConfig" structure
"_PyInitError _Py_InitializeFromConfig(const _PyCoreConfig *config)" function
"void _Py_FatalInitError(_PyInitError err)" function (should be called on failure)

I'm not really sure of the benefit compared to the current initialization API using Py_xxx global configuration variables (ex: Py_IgnoreEnvironmentFlag) and Py_Initialize().

_PyCoreConfig basically exposed *all* input parameters used to initialize Python, much more than jsut global configuration variables and the few function that can be called before Py_Initialize(): https://docs.python.org/dev/c-api/init.html

Currently PEP-432 is the best description we have, and it looks like Victor has been heading in that direction too (deliberately? I don't know :) ).

Well, it's a strange story. At the beginning, I had a very simple use case... it took me more or less one year to implement it :-) My use case was to add... a new -X utf8 command line option:

parsing the command line requires to decode bytes using an encoding
the encoding depends on the locale, environment variable and options on the command line
environment variables depend on the command line (-E option)

If the utf8 mode is enabled (PEP-540), the encoding must be set to UTF-8, all configuration must be removed and the whole configuration (env vars, cmdline, etc.) must be read again from scratch :-)

To be able to do that, I had to collect *every single* thing which has an impact on the Python initialization: all things that I moved into _PyCoreConfig.

... but I didn't want to break the backward compatibility, so I had to keep support for Py_xxx global configuration variables... and also the few initialization functions like Py_SetPath() or Py_SetStandardStreamEncoding().

Later it becomes very dark, my goal became very unclear and I looked at the PEP-432 :-)

Well, I wanted to expose _PyCoreConfig somehow, so I looked at the PEP-432 to see how it can be exposed.

By necessity, it touches a lot of people's contributions to Python, but it also has the potential to seriously improve even more people's ability to _use Python (for example, I know an app that you all would recognize the name of who is working on embedding Python right now and would _love certain parts of this side of things to be improved).

_PyCoreConfig "API" makes some things way simpler. Maybe it was already possible to do them previously but it was really hard, or maybe it was just not possible.

If a _PyCoreConfig field is set: it has the priority over any other way to initialize the field. _PyCoreConfig has the highest prioririty.

For example, _PyCoreConfig allows to completely ignore the code which computes sys.path (and related variables) by setting directly the "path configuration":

nmodule_search_path, module_search_paths: list of sys.path paths
executable: sys.executable */
prefix: sys.prefix
base_prefix: sys.base_prefix
exec_prefix: sys.exec_prefix
base_exec_prefix sys.base_exec_prefix
(Windows only) dll_path: Windows DLL path

The code which initializes these fields is really complex. Without _PyCoreConfig, it's hard to make sure that these fields are properly initialized as an embedder would like.

Nick, Victor, Eric, (others?) - are you interested in having a virtual whiteboard session to brainstorm how the "perfect" initialization looks? And probably a follow-up to brainstorm how to get there without breaking the world? I don't think we're going to get to be in the same room anytime before the language summit, and it would be awesome to have something concrete to discuss there.

Sorry, I'm not sure of the API / structures, but when I discussed with Eric Snow at the latest sprint, we identified different steps in the Python initialization:

only use bytes (no encoding), no access to the filesystem (not needed at this point)
encoding defined, can use Unicode
use the filesystem
configuration converted as Python objects
Python is fully initialized

Once I experimented to reorganize _PyCoreConfig and _PyMainInterpreterConfig to avoid redundancy: add a _PyPreConfig which contains only fields which are needed before _PyMainInterpreterConfig. With that change, _PyMainInterpreterConfig (and _PyPreConfig) *contained* _PyCoreConfig.

But it the change became very large, I wasn't sure that it was a good idea, I abandonned my change.

https://github.com/python/cpython/pull/10575
https://bugs.python.org/issue35266
I have a more advanced version in this branch of my fork: https://github.com/vstinner/cpython/commits/pre_config_next

Ok, something else. _PyCoreConfig (and _PyMainInterpreterConfig) contain memory allocated on the heap. Problem: Python initialization changes the memory allocator. Code using _PyCoreConfig requires some "tricks" to ensure that the memory is *freed with the same allocator used to *allocate memory.

I created bpo-35265 "Internal C API: pass the memory allocator in a context" to pass a "context" to a lot of functions, context which contains the memory allocator but can contain more things later.

The idea of "a context" came during the discussion about a new C API: stop to rely on any global variable or "shared state", but *explicitly* pass a context to all functions. With that, it becomes possible to imagine to have two interpreters running in the same threads "at the same time".

Honestly, I'm not really sure that it's fully possible to implement this idea... Python has *so many "shared state", like *everywhere. It's really a giant project to move these shared states into structures and pass pointers to these structures.

So again, I abandonned my experimental change: https://github.com/python/cpython/pull/10574

Memory allocator, context, different structures for configuration... it's really not an easy topic :-( There are so many constraints put into a single API!

The conservation option at this point is to keep the API private.

... Maybe we can explain how to use the private API but very explicitly warn that this API is experimental and can be broken anytime... And I plan to break it, to avoid redundancy between core and main configuration for example.

... I hope that these explanations give you a better idea of the big picture and the challenges :-)

zooba commented 5 years ago

Thanks, Victor, that's great information.

Memory allocator, context, different structures for configuration... it's really not an easy topic :-( There are so many constraints put into a single API!

This is why I'm keen to design the ideal *user* API first (that is, write the examples of how you would use it) and then figure out how we can make it fit. It's kind of the opposite approach from what you've been doing to adapt the existing code to suit particular needs.

For example, imagine instead of all the PySet*() functions followed by Py_Initialize() you could do this:

    PyObject *runtime = PyRuntime_Create();
    /* optional calls */
    PyRuntime_SetAllocators(runtime, &my_malloc, &my_realloc, &my_free);
    PyRuntime_SetHashSeed(runtime, 12345);

/* sets this as the current runtime via a thread local */
auto old_runtime = PyRuntime_Activate(runtime);
assert(old_runtime == NULL)

/* pretend triple quoting works in C for a minute ;) */
const char *init = """
import os.path
import sys

    sys.executable = argv0
    sys.prefix = os.path.dirname(argv0)
    sys.path = [os.getcwd(), sys.prefix, os.path.join(sys.prefix, "Lib")]

    pyvenv = os.path.join(sys.prefix, "pyvenv.cfg")
    try:
        with open(pyvenv, "r", encoding="utf-8") as f:  # *only* utf-8 support at this stage
            for line in f:
                if line.startswith("home"):
                    sys.path.append(line.partition("=")[2].strip())
                    break
    except FileNotFoundError:
        pass

    if sys.platform == "win32":
        sys.stdout = open("CONOUT$", "w", encoding="utf-8")
    else:
        # no idea if this is right, but you get the idea
        sys.stdout = open("/dev/tty", "w", encoding="utf-8")
    """;

    PyObject *globals = PyDict_New();
    /* only UTF-8 support at this stage */
    PyDict_SetItemString(globals, "argv0", PyUnicode_FromString(argv[0]));
    PyRuntime_Initialize(runtime, init_code, globals);
    Py_DECREF(globals);

/* now we've initialised, loading codecs will succeed if we can find them or fail if not,
 * so we'd have to do cleanup to avoid depending on them without the user being able to
 * avoid it... */

    PyEval_EvalString("open('file.txt', 'w', encoding='gb18030').close()");

    /* may as well reuse DECREF for consistency */
    Py_DECREF(runtime);

Maybe it's a terrible idea? Honestly I'd be inclined to do other big changes at the same time (make PyObject opaque and interface driven, for example).

My point is that if the goal is to "move the existing internals around" then that's all we'll ever achieve. If we can say "the goal is to make this example work" then we'll be able to do much more.

ericsnowcurrently commented 5 years ago

On Wed, Feb 13, 2019 at 10:56 AM Steve Dower \report@bugs.python.org\ wrote:

Nick, Victor, Eric, (others?) - are you interested in having a virtual whiteboard session to brainstorm how the "perfect" initialization looks? And probably a follow-up to brainstorm how to get there without breaking the world? I don't think we're going to get to be in the same room anytime before the language summit, and it would be awesome to have something concrete to discuss there.

Count me in. This is a pretty important topic and doing this would help accelerate our efforts by giving us a clearer common understanding and objective. FWIW, I plan on spending at least 5 minutes of my 25 minute PyCon talk on our efforts to fix up the C-API, and this runtime initialization stuff is an important piece.

ericsnowcurrently commented 5 years ago

On Wed, Feb 13, 2019 at 5:09 PM Steve Dower \report@bugs.python.org\ wrote:

This is why I'm keen to design the ideal *user* API first (that is, write the examples of how you would use it) and then figure out how we can make it fit. It's kind of the opposite approach from what you've been doing to adapt the existing code to suit particular needs.

That makes sense. :)

For example, imagine instead of all the PySet*() functions followed by Py_Initialize() you could do this:
PyObject \*runtime = PyRuntime_Create();

FYI, we already have a _PyRuntimeState struct (see Include/internal/pycore_pystate.h) which is where I pulled in a lot of the static globals last year. Now there is one process-global _PyRuntime (created in Python/pylifecycle.c) in place of all those globals. Note that _PyRuntimeState is in parallel with PyInterpreterState, so not a PyObject.

/* optional calls \*/
PyRuntime_SetAllocators(runtime, &my_malloc, &my_realloc, &my_free);
PyRuntime_SetHashSeed(runtime, 12345);

Note that one motivation behind PEP-432 (and its config structs) is to keep all the config together. Having the one struct means you always clearly see what your options are. Another motivation is to keep the config (dense with public fields) separate from the actual run state (opaque). Having a bunch of config functions (and global variables in the status quo) means a lot more surface area to deal with when embedding, as opposed to 2 config structs + a few initialization functions (and a couple of helpers) like in PEP-432.

I don't know that you consciously intended to move away from the dense config struct route, so I figured I'd be clear. :)

/* sets this as the current runtime via a thread local \*/
auto old_runtime = PyRuntime_Activate(runtime);
assert(old_runtime == NULL)

Hmm, there are two ways we could go with this: keep using TLS (or static global in the case of _PyRuntime) or update the C-API to require explicitly passing the context (e.g. runtime, interp, tstate, or some wrapper) into all the functions that need it. Of course, changing that would definitely need some kind of compatibility shim to avoid requiring massive changes to every extension out there, which would mean effectively 2 C-APIs mirroring each other. So sticking with TLS is simpler. Personally, I'd prefer going the explicit argument route.

/* pretend triple quoting works in C for a minute ;) \*/
const char \*init_code = """

[snip] """;

PyObject \*globals = PyDict_New();
/* only UTF-8 support at this stage \*/
PyDict_SetItemString(globals, "argv0", PyUnicode_FromString(argv[0]));
PyRuntime_Initialize(runtime, init_code, globals);

Nice. I like that this keeps the init code right by where it's used, while also making it much more concise and easier to follow (since it's Python code).

PyEval_EvalString("open('file.txt', 'w', encoding='gb18030').close()");

I definitely like the approach of directly embedding the Python code like this. :) Are there any downsides?

Maybe it's a terrible idea?

Nah, we definitely want to maximize simplicity and your example offers a good shift in that direction. :)

Honestly I'd be inclined to do other big changes at the same time (make PyObject opaque and interface driven, for example).

Definitely! Those aren't big blockers on cleaning up initialization though, are they?

My point is that if the goal is to "move the existing internals around" then that's all we'll ever achieve. If we can say "the goal is to make this example work" then we'll be able to do much more.

Yep. I suppose part of the problem is that the embedding use cases aren't understood (or even recognized) well enough.

ncoghlan commented 5 years ago

Steve, you're describing the goals of PEP-432 - design the desired API, then write the code to implement it. So while Victor's goal was specifically to get PEP-540 implemented, mine was just to make it so working on the startup sequence was less awful (and in particular, to make it possible to rewrite getpath.c in Python at some point).

Unfortunately, it turns out that redesigning a going-on-thirty-year-old startup sequence takes a while, as we first have to discover what all the global settings actually *are* :)

https://www.python.org/dev/peps/pep-0432/#invocation-of-phases describes an older iteration of the draft API design that was reasonably accurate at the point where Eric merged the in-development refactoring as a private API (see https://bugs.python.org/issue22257 and https://www.python.org/dev/peps/pep-0432/#implementation-strategy for details).

However, that initial change was basically just a skeleton - we didn't migrate many of the settings over to the new system at that point (although we did successfully split the import system initialization into two parts, so you can enable builtin and frozen imports without necessarily enabling external ones).

The significant contribution that Victor then made was to actually start migrating settings into the new structure, adapting it as needed based on the goals of PEP-540.

Eric updated quite a few more internal APIs as he worked on improving the subinterpreter support.

Between us, we also made a number of improvements to https://docs.python.org/3/c-api/init.html based on what we learned in the process of making those changes.

So I'm completely open to changing the details of the API that PEP-432 is proposing, but the main lesson we've learned from what we've done so far is that CPython's long history of embedding support *does* constrain what we can do in practice, so it's necessary to account for feasibility of implementation when considering what we want to offer.

Ideally, the next step would be to update PEP-432 with a status report on what was learned in the development of Python 3.7 with the new configuration structures, and what the internal startup APIs actually look like now. Even though I reviewed quite a few of Victor and Eric's PR, even I don't have a clear overall picture of where we are now, and I suspect Victor and Eric are in a similar situation.

ncoghlan commented 5 years ago

Note also that Eric and I haven't failed to agree with Victor on an API, as Victor hasn't actually written a concrete proposal *for* a public API (neither as a PR updating PEP-432, nor as a separate PEP).

The current implementation does NOT follow the PEP as written, because _Py_CoreConfig ended up with all the settings in it, when it's supposed to be just the bare minimum needed to get the interpreter to a point where it can run Python code that only accesses builtin and frozen modules.

ncoghlan commented 5 years ago

Since I haven't really written them down anywhere else, noting some items I'm aware of from the Python 3.7 internals work that haven't made their way back into the PEP-432 public API proposal yet:

If we only had to care about the pure embedding case, this would be a lot easier. We don't though: we also care about "CPython interpreter variants" that end up calling Py_Main, and hence respect all the CPython environment variables, command line arguments, and in-process global variables. So what Victor ended up having to implement was data structs for all three of those configuration sources, and then helper functions to write them into the consolidated config structs (as well as writing them back to the in-process global variables).
Keeping the Py_Initialize and Py_Main APIs working mean that there are several API preconfiguration functions that need a way to auto-initialize the core runtime state with sensible defaults
the current private implementation uses the PyCoreConfig/PyMainInterpreterConfig naming scheme. Based on some of Eric's work, the PEP currently suggests PyRuntimeConfig PyMainInterpreterConfig, but I don't think any of us are especially in love with the latter name. Our inability to find a good name for it may also be a sign that it needs to be broken up into three distinct pieces (PySystemInterfaceConfig, PyCompilerConfig, PyMainModuleConfig)

vstinner commented 5 years ago

I created bpo-36142: "Add a new _PyPreConfig step to Python initialization to setup memory allocator and encodings".

vstinner commented 5 years ago

I wrote the PEP-587 "Python Initialization Configuration" which has been accepted. It allows to completely override the "Path Configuration". I'm not sure that it fully implementation what it requested here, but it should now be easier to tune the Path Configuration. See: https://www.python.org/dev/peps/pep-0587/#multi-phase-initialization-private-provisional-api

I implemented the PEP-587 in bpo-36763.

ac970517-7943-4610-bdab-4045a31a9505 commented 5 years ago

To Victor: So how does the implementation of PEP-587 help configure embedded python with venv? It would be great help to provide some minimal instructions.

ac970517-7943-4610-bdab-4045a31a9505 commented 4 years ago

Just in case this will be of help to anyone, I found a way to use venvs in embedded python.

You first need to Initialize python that is referred as home in pyvenv.cfg.
Then you execute the following script:

import sys
sys.executable = r"Path to the python executable inside the venv"
path = sys.path
for i in range(len(path)-1, -1, -1):
    if path[i].find("site-packages") > 0:
        path.pop(i)
import site
site.main()
del sys, path, i, site

zooba commented 4 years ago

If you just want to be able to import modules from the venv, and you know the path to it, it's simpler to just do:

    import sys
    sys.path.append(r"path to venv\Lib\site-packages")

Updating sys.executable is only necessary if you're going to use libraries that try to re-launch itself, but any embedding application is going to have to do that anyway.

ac970517-7943-4610-bdab-4045a31a9505 commented 4 years ago

To Steve:

I want the embedded venv to have the same sys.path as if you were running the venv python interpreter. So my method takes into account for instance the include-system-site-packages option in pyvenv.cfg. Also my method sets sys.prefix in the same way as the venv python interpreter.

348d154b-9387-4632-ad74-398fd999ff6e commented 4 years ago

I just can say that sorting this issue (and PEP-0432) out would be great! I run into this issue when embedding CPython in a Windows app, and want to use some pre-installed Python, which is not part of the install package... So beside pyenv venvs, please keep Windows devs in mind, too! :)

zooba commented 4 years ago

I run into this issue when embedding CPython in a Windows app, and want to use some pre-installed Python, which is not part of the install package...

You'll run into many more issues if you keep doing this...

The way to use a pre-installed Python on Windows is to follow PEP-514 to find and run "python.exe" (or force your users to learn how to configure PATH, which is pretty hostile IMHO, but plenty of apps do it anyway).

If you really need to embed, then add the embeddable package (available from our downloads page) into your distribution and refer to that. Then you can also bundle whatever libraries you need and set up sys.path using the ._pth file.

d0148cd8-789b-42e3-8dea-c06aad871cbd commented 4 years ago

As a side-note: In my case I am embedding Python in a C program for several reasons:

Added an additional module (generated with SWIG)
This module needs a licence key, which I supply in the C program (to make it more difficult to extract it).
I need a different executable name (python is too unspecific) to identify the running program in things like Windows TaskManager, Posix ps, Oracle V$SESSION.

I'm using virtual environments only in the Linux version, the Windows version uses the embeddable ZIP distribution.

The Linux version was working with Python 2.7 and "virtualenv". Now I'm updating to Python 3.6 and "venv" and running into this issue.

It seems like virtualenv can handle the situation, but venv can't. Maybe it is worth looking at what virtualenv does differently?

fe491b48-23c2-4033-aaa2-1a6613895466 commented 4 years ago

For the record. Since virtualenv 20.0.0 (or there about) switched to the python -m venv style virtual environment structure, the C API for embedding when using a virtual environment is now completely broken on Windows. The same workaround used on UNIX doesn't work on Windows.

The only known workaround is in the initial Python code you load, to add:

import site
site.addsitedir('C:/some/path/to/pythonX.Y/Lib/site-packages')

to at least force it to use the site-packages directory from the virtual environment.

As to mod_wsgi, means that on Windows the WSGIPythonHome directive no longer works anymore and have to suggest that workaround instead.

vstinner commented 3 years ago

See also "Configure Python initialization (PyConfig) in Python" https://mail.python.org/archives/list/python-dev@python.org/thread/HQNFTXOCDD5ROIQTDXPVMA74LMCDZUKH/#X45X2K4PICTDJQYK3YPRPR22IGT2CDXB

And bpo-42260: [C API] Add PyInterpreterState_SetConfig(): reconfigure an interpreter.

superchromix commented 1 year ago

Is there some update on this issue? I'm also encountering problems when trying to embed a specific Python environment into a C++ application (running on Windows). It seems like no tutorial exists for how to properly set up the various paths, etc. within the PyConfig object before calling Py_InitializeFromConfig.