python / cpython

The Python programming language

https://www.python.org

Other

63.34k stars 30.32k forks source link

Install a static installation description file as part of the Python installation #107956

Open FFY00 opened 1 year ago

FFY00 commented 1 year ago

Feature or enhancement

Ship a static file that describes various aspects of the installation as part of the Python installation.

Pitch

Shipping such a file would make it much easier to introspect a Python installation without having to run the interpreter. There are many use-cases for this, but some key would be for eg. Python launchers, and cross-compilation tooling.

Information we could provide:

Python language version
Python implementation (inc. version)
Interpreter path
Standard library path
Interpreter architecture
Python ABI
- Tag (PEP 3149 on Posix)
- Build flags (debug, trace refs)
Static libpython (available?)
- Name
- Location
Shared libpython (available?)
- Name (standard library)
- Major-only pin library name (eg. libpython3.so)
- Location
C API
- Hex version
- Include paths
- Required C flags
Supported native module file extensions
- Interpreter-specific
- Stable ABI

(incomplete table, just an initial proposal)

Note: This issue specifically targets a descriptor file for the Python installation, not a Python environment, so paths are out of scope.

Previous discussion

https://discuss.python.org/t/what-information-is-useful-to-know-statically-about-an-interpreter/25563.

(cc @brettcannon)

Linked PRs

gh-108483

brettcannon commented 1 year ago

To be clear, this is for a CPython installation, not a general Python installation based on what's included (and assuming it's all required as nothing is specified as optional).

FFY00 commented 1 year ago

I'd like to make the a general Python installation thing, by putting the CPython specific details in their own section.

brandonardenwalli commented 1 year ago

Also to be clarify:

Note: This issue specifically targets a descriptor file for the Python installation, not a Python environment, so paths are out of scope.

Is this to mean that the description file would be the same across all virtual environments made from a specific installation? For example, let us say I install python version 3.11 and then make 3 virtual environments: one, two, and three. All three of this environments to have the same description file, no?

FFY00 commented 1 year ago

Yes, as the description file would be located in the Python installation paths themselves, instead of virtual environment.

Btw, standard Python virtual environment already have such a file, pyvenv.cfg, though it has it's own issues.

brettcannon commented 1 year ago

My thoughts on this general topic are written down in https://github.com/brettcannon/python-launcher/discussions/168 . While that proposal does not make sense in view of a static file tied to an interpreter, below is probably what would make sense (if you created this file for virtual environments as well):

    {
        // An array specifying what is required to execute the interpreter.
        // The expectation is to append args to the end of the array before
        // execution.
        // E.g. for conda environments:
        // ```
        // ["/path/to/conda", "run", "--path",
        //  "/home/brett/.conda/envs/conda-env", "--no-capture-output"]
        // ```.
        "run": [
            "/home/brett/my-venvs/my-venv/bin/python3.10"
        ],
        "python_version": {
            // `sys.version_info`
            "major": 3, // Optional
            "minor": 10, // Optional
            "micro": 1, // Optional
            "releaselevel": "final", // Optional
            "serial": 0 // Optional
        },
        "implementation": { // Optional
            // `sys.implementation`
            "name": "cpython",
            "version": { // Has the same structure as `python_version` above.
                "major": 3, // Optional
                "minor": 10, // Optional
                "micro": 1, // Optional
                "releaselevel": "final", // Optional
                "serial": 0 // Optional
            }
        },
        "executable": {
            "bits": 64, // Optional; `math.log2(sys.maxsize+1) + 1`
            "architecture": "x86-64"  // Optional; platform.machine()
        },
        "environment": { // Optional
            // What type of environment, e.g. "virtual", "conda", etc.
            "type": "virtual",
            "name": "my-venv"
        }
}

FFY00 commented 1 year ago

@brettcannon thanks! Your use-case is a bit different than mine (improving cross-compilation support), so it's definitely .

I have a working prototype locally, which I am planing to propose soon, and there are a couple things I'd like to finalize before that, if you have any thoughts:

Format
- In my prototype I opted for TOML as that is what is being most widely used lately in modern Python-related tooling. The other main alternative to consider would be JSON. Personally, I have a biased preference towards TOML, but JSON would also be a good choice.
Location
- I think what would make the most sense would probably be placing the static description file in the stdlib directory, like the EXTERNALY-MANAGED file, but I think there would also be a lot of value in having standardized-ish location (or, more realistically, set of locations) to enable easier discovery. I am very dubious, though, on how easier that'd actually make discovery, but nothing actually prevents us from doing both (placing the file next to stdlib and in this, easier to discover, location) so unless there is any argument against placing it in the stdlib path, I think that's probably the way to go.

This initial implementation proposal will only include the main key details (I think probably the version, executable path, and stdlib path), and the rest of the details can then be proposed later in smaller PRs/proposals. My key goal here is to get the static description file itself sorted out.

if you created this file for virtual environments as well

That was specifically left out of scope [^1]_ because it's tricky to do properly, the main thing that jumps out to me being the Python installation updating and leaving incorrect virtual environment static description files in place.

Some thoughts on possible approaches to tackle this issue:

Have multiple config files that complement each-other (eg. the virtual environment static description file would reference the installation static description file)
Have centralized virtual environment management (eg. when a virtual environment is created, it would be registered in a central database of sorts, and then a command to ensure virtual environments are up-to-date could be provided)

But ultimately, this is a significantly harder problem, and considering this is not incompatible with the current proposal and that the current proposal would already be a huge improvement over the current status quo, I think it makes sense to pursue it with the current scope.

[^1]: The file proposed here is tied to the installation, not the environment or interpreter path, excluding virtual environments.

brettcannon commented 1 year ago

Format

I think the question is whether you expect people to ever read or write this file? If the answer is in general "no", then I think JSON is the easier format for people to ingest tooling-wise. But if you expect a human being to ever interact with the file I can see TOML making sense.

Location

I think the way EXTERNALLY-MANAGED is stored makes sense from a CPython perspective. I think the question is if you expect other implementations or builds of Python to use this file? For instance, what if the binary is fully self-contained and thus there is no stdlib directory to look at since it doesn't exist?

the main thing that jumps out to me being the Python installation updating and leaving incorrect virtual environment static description files in place.

Fair enough. I always have to remember Unix is just symlinks and so it can change underneath itself while Windows is a hard copy and so you can't mess that detail up. But that might actually be an argument for the file when copying the python binary into the virtual environment.

zooba commented 1 year ago

This seems to me like it should just be a sysconfig.json file in sys.prefix, which ought to adequately communicate to anyone who's looking for it whether it's their thing or not (and it's specifically not going to be a PEP 514 substitute).

Hopefully between a couple of substitutes (i.e. allow {sys.<attr>} substitutions to be embedded in the otherwise static file) and some better alignment between platforms, we can more-or-less replace sysconfig.get_config_vars() with json.load, and also anyone outside of our runtime can also get everything that way too.

I don't see any reason to separate it from sysconfig though. The semantics already exist and people understand them, so any proposal will get through quicker if it's clearly based on it.

FFY00 commented 1 year ago

I think the question is whether you expect people to ever read or write this file? If the answer is in general "no", then I think JSON is the easier format for people to ingest tooling-wise. But if you expect a human being to ever interact with the file I can see TOML making sense.

Yes, I expect people to read, and in some situations, write this file, but not too frequently. Some use-cases:

Manually introspecting an installation without running the interpreter, which is the most relevant when working with cross-builds, but certainly not limited to it
This plays a bit with my outer goal of trying to make it possible to build most extensions, etc. without having to run the interpreter, where in certain scenarios, I think users might need to write the file themselves (eg. when it's not available — I expand into this below)

I think the way EXTERNALLY-MANAGED is stored makes sense from a CPython perspective. I think the question is if you expect other implementations or builds of Python to use this file? For instance, what if the binary is fully self-contained and thus there is no stdlib directory to look at since it doesn't exist?

Yes, I expect other implementations to use this file, though not necessarily ship it in the same way.

My plan was for the format to be shared between everyone, but if and how the static description file is provided, to be up to each implementation.

Of course, I'd like the way the static description file is provided to be consistent, but as you point out, this is something that can be highly dependent on the implementation, build format, target platform, etc., so I don't think we can define a single way to do it. We could define how this should work in different scenarios, but I don't think we (CPython) should be the responsible for telling other implementations how they should implement this, especially in scenarios where we might not be experts.

For consistency, I would expect implementations that use an installation scheme similar to CPython to place the file in the same/equivalent place, but they wouldn't be required to do so. I think trying to get this to happen is something we should do as a community, by engaging in discussions with the other parties.

Finally, as I mentioned above, deciding if the file should be provided at all, is something I also think should be up to each implementation, and is something that could depend on the target, build format, etc.

This seems to me like it should just be a sysconfig.json file in sys.prefix, which ought to adequately communicate to anyone who's looking for it whether it's their thing or not (and it's specifically not going to be a PEP 514 substitute).

Regarding the name, I think using "sysconfig" in the name would imply a tie to the sysconfig module, which isn't really true for the proposal as-is (its relationship with sysconfig is something you mention later in your reply, so I elaborate this in the response to that). The purpose of the file is also a bit different, as it is not related to the system, but rather the Python installation specifically.

Regarding the location, while I guess using sys.prefix would be fine for Windows (I'm assuming, I'm not a Windows expert and I haven't looked into that), but it wouldn't on POSIX environments, where the prefix is /usr, /usr/local, etc., which are paths where we shouldn't be installing data like this. Additionally, it is currently possible to install multiple versions of Python to the same prefix, resulting in the static description files to conflict. To fix this, we would need to change the file name, which would need to include all the things that might change between Python installations that can be installed to the same prefix, and since we don't really have limitations regarding that, I think this introduces a lot of avoidable complexity. Thus, IMO the static description file should be installed to an installation-specific directory. That said, do note that nothing prevents us from also installing the files (probably by symlinking them when possible) to another directory too, where it could be more discoverable (this is something I already went into in my reply above).

Hopefully between a couple of substitutes (i.e. allow {sys.<attr>} substitutions to be embedded in the otherwise static file) and some better alignment between platforms, we can more-or-less replace sysconfig.get_config_vars() with json.load, and also anyone outside of our runtime can also get everything that way too.

I don't quite agree here :sweat_smile:. This goes into the bigger picture question of the direction we want to push sysconfig and friends. I wanted to try to move users away from using sysconfig.get_config_vars in the way they are right now.

IMO sysconfig.get_config_vars is very problematic. I wrote a bit about this in the 1st point of GH-103480, sysconfig.get_config_vars exports very specific low-level information that was never meant to be public API. The issue is that a lot of information necessary for building extensions, and other similar use-cases, is only available this way.

For this reason, instead of building further on top of sysconfig.get_config_vars, I think a better direction would be to provide a lot of the information currently only available there in a proper stable public API — this is the main goal of the new sysconfig API proposal.

That said, sysconfig.get_config_vars is still a valuable mechanism and I don't think we should get rid of it or try to replace it. IMO it should stay as an unstable-ish feature that provides access to internal details of the build, similarly to the dis module. A good first step for this would be to improve the documentation to more clearly communicate the stability of its data. Right now, we need to be careful when making changes to the Makefile, because a lot of user code depends on the names and values of its variables, and while I think we should keep making an effort to avoid breaking things unnecessarily, I think we should move towards the goal of minimizing the impact of this kind of changes.

Regarding being able to simply replace sysconfig.get_config_vars with json.load in code, I am not necessarily against the idea of also making the sysconfig.get_config_vars data more easily available statically, but if we go with the approach I described above, I think that should be separate from this proposal.

@zooba does this make sense to you? Like I said, it depends a lot on our higher level goals regarding the direction want to move sysconfig and friends towards.

zooba commented 1 year ago

I don't quite agree here 😅. This goes into the bigger picture question of the direction we want to push sysconfig and friends. I wanted to try to move users away from using sysconfig.get_config_vars in the way they are right now.

Yeah, fair enough. It was more of a "similar intent" suggestion, rather than saying we should actually do it.

But I do think it's important to have a sensible migration path. So any data that we can provide through existing (or new) config vars, we should, so that way users don't have to do a switch on version. Any data that we fix should be fixed in both, and existing scenarios should keep working and even get better without people having to change their code.

The static file should include all of sysconfig I think, if only for compatibility. It can be kept aside from new, reliable values that we want people to migrate to, but if it's not there then we'll definitely miss scenarios that people need.

But I agree with the overall goal of providing the actual info needed for building and installing extensions. We want the new fields to be the actual commands, not just the ones that were used to build CPython. This probably ties into #108064 to document these commands.

FFY00 commented 1 year ago

But I do think it's important to have a sensible migration path. So any data that we can provide through existing (or new) config vars, we should, so that way users don't have to do a switch on version. Any data that we fix should be fixed in both, and existing scenarios should keep working and even get better without people having to change their code.

I partially agree. While I think any data changes should be reflected on all mechanisms/interfaces (old sysconfig API, new sysconfig API, static description file, etc.), I don't think we should expand old interfaces unless we want them to keep being used as the source of the new information we added there (eg. if we want to expose some extra information in the new API, I don't think we should also add a config var with that information, as sysconfig.get_config_vars is not the place we want users getting that information from. OTOH, if we specifically want to expose some low level information about the build, where sysconfig.get_config_vars is the place we want users to get that information from, then we can add a new config var for that).

The static file should include all of sysconfig I think, if only for compatibility. It can be kept aside from new, reliable values that we want people to migrate to, but if it's not there then we'll definitely miss scenarios that people need.

While I understand the reasoning behind this, I am very uneasy about the long-term maintenance aspect of that. Instead of blindly adding all the sysconfig data to the static description file, I would be much more comfortable figuring out which data is needed in applications that require, or benefit from, using a static description file and investigate if it is something we can be safely exposed within the constraints of the static description file (only being able to update it when updating the installation, etc).

Also, I don't know if this was clear enough in the proposal description, and I think we might not be on the same page regarding it, but my purpose for this file was to help in use-cases where running the interpreter from introspection is impossible or undesirable. The objective wasn't to replace sysconfig, which is something that would never be possible to do in its entirety, and would require a lot of consideration. That is something we could consider, but it just out of my scope right now, and this proposal as-is is not really a blocker for that — IMO it's better to be a bit conservative right now with the information we add, and then add everything else from sysconfig if that's a direction we decide to take.

But I agree with the overall goal of providing the actual info needed for building and installing extensions. We want the new fields to be the actual commands, not just the ones that were used to build CPython. This probably ties into #108064 to document these commands.

:rofl: okay, this is something I slightly disagree again! While I think providing the commands is helpful, I don't think this is the place to do so.

IMO both in the static description file and the sysconfig API, we should focus on exposing all the required details (eg. target architecture, shared libpython name and location, etc.) as separate fields. There are multiple reasons for this, like providing commands just not being viable for anything that deviates from using GCC or Clang to compile C, with not even GCC and Clang being fully compatible, so it'd be more like GCC and maybe Clang works. My proposal would be to just have the required information as separate fields in introspection mechanisms like the static description file or sysconfig, and provide the commands for whichever use-cases we want to support (eg. compiling C extensions with GCC) via a separate mechanism, such as python-config or whichever evolution thereof.

Sorry for the large chunk of text again, but this is something I have extensively thought about, so I have opinions :sweat_smile:. I am definitely happy to discuss and figure out together which would be the best solution for these different questions.

zooba commented 1 year ago

We're more in agreement than you think 😉 I'm just stating things very simply compared to the depth that you've thought about them, and I think you're assuming that I've also thought these through 110% and am making definitive design statements. I'm not - I'm just standing somewhere that I can see the problem and generally waving in their direction (I should add, I'm confident in that direction, I'm just not spelling out as full an implementation plan as you are).

I definitely agree that not all the sysconfig data is useful.[^1] Even so, for practicality, having a deprecated-on-release subkey of the new data that is literally "what sysconfig would show, for better or worse" makes it much easier for devs to migrate to the new data. Once they're there, they can start using the new data when it's available (this also means a distributor can generate a "sysconfig-only" version of the data for older releases if they want). Without this, devs who might need it won't switch, because they don't have a nice fallback for the older versions of Python they support. Consider, they might read the file and still need to launch Python to introspect itself - we ought to be able to avoid that on first release.

Agreed that replacing sysconfig isn't the goal, but it is a good test for whether we've provided enough information. Right now, the gaps in this area are filled by people using sysconfig and often making assumptions about how Python is normally installed - if we can't replace all of those with this file (plus some runtime calculation, of course), I think we've missed the mark. People shouldn't need to use sysconfig if we get this right, whether they can launch the interpreter or not.

And yeah, by "commands" I really just meant the compile-time options required, in some format that a library can figure out which actual options to use with its own compiler. That doc page ought to end up with specific compiler commands as examples, but we wouldn't put those in here. However, there should be enough information in this file that a program/script can figure out how to match the original compiler settings enough to get a compatible extension module. Again, if we can't provide that, I think we've missed our goal, so this is more of a validation test than a specific feature.

[^1]: My primary focus is Windows, where the sysconfig info is often worse than useless 😉 I still think it ought to be in there.

eli-schwartz commented 1 year ago

But I agree with the overall goal of providing the actual info needed for building and installing extensions. We want the new fields to be the actual commands, not just the ones that were used to build CPython. This probably ties into #108064 to document these commands.

I'm not entirely sure I understand the distinction you are making between "the actual commands" and "the ones that were used to build CPython".

The commands used for building CPython are actual real commands, they contain for example the name of a specific compiler. That's actually problematic because build systems such as meson, cmake, autotools, (and to a small extent that I wish were a lot larger, setuptools) etc. cannot accept "actual commands" from sysconfig since they need to be able to use different commands, and sometimes compile different languages that aren't used in the CPython build system at all (such as mingw Fortran on Windows, which is a GCC component that, yes, is getting mixed with MSVC-built pythons).

Any proposed API for CPython that wants to replace current usage of sysconfig.get_config_vars as the preferred information source used by build systems, cannot hardcode the name of a compiler to use. If it does hardcode the name of a compiler to use then we (meson, in this case) will simply ignore that new API and keep using sysconfig.get_config_vars with a hacked up heuristic -- our current one to find the correct library / import library is, depending on Unix/windows and also CPython / PyPy, a mixture of templating the hardcoded word python /pypy3-c, the version number, DEBUG_EXT, ABIFLAGS, py_version_nodot, etc.

Basically, what we would like to see in order to make our jobs as buildsystem developers easier, is a way to do for Windows what pkg-config --cflags --libs python3 does on Linux and macOS. (And it should of course work cross platform, so we can use it on Unix platforms as a fallback when pkg-config isn't installed.)

We don't really want to know the commands to use -- we have our own commands that we're probably already using to build (embedded copies of?) regular C/C++/Fortran libraries into static libraries that will then get linked with a python binding file and the python import library. We need the flexibility to choose our own toolchain, because to us, libpython is just "yet another dependency" irrespective of the fact that the libpython dependency happens to parallel the runtime we are building plugins for.

If people want to document and know an example command capable of being copy-pasted and run to build an extension without resorting to a build system like meson, cmake, or autotools... then IMHO this is best as documentation, not as part of a stdlib API. A stdlib API should only contain the python-specific parts that would be used by the documentation. It's up to the documentation to then explain how to find/choose/activate/run a compiler.

eli-schwartz commented 1 year ago

And yeah, by "commands" I really just meant the compile-time options required, in some format that a library can figure out which actual options to use with its own compiler.

Thanks for clarifying this. I appear to have missed that response while writing my own, lol...

I think a large part of my confusion was that I typically think of, say, cl.exe as a command, and things like header search directories or import libraries as flags. Different terminologies I guess.

FFY00 commented 1 year ago

We're more in agreement than you think 😉 I'm just stating things very simply compared to the depth that you've thought about them, and I think you're assuming that I've also thought these through 110% and am making definitive design statements. I'm not - I'm just standing somewhere that I can see the problem and generally waving in their direction (I should add, I'm confident in that direction, I'm just not spelling out as full an implementation plan as you are).

Gotcha. I wasn't assuming that, but wasn't sure the tone you were going for (yay nonverbal communication). For me it's easier to spell everything out, to make sure we are on the same page and there isn't any misunderstanding.

I definitely agree that not all the sysconfig data is useful.1 Even so, for practicality, having a deprecated-on-release subkey of the new data that is literally "what sysconfig would show, for better or worse" makes it much easier for devs to migrate to the new data. Once they're there, they can start using the new data when it's available (this also means a distributor can generate a "sysconfig-only" version of the data for older releases if they want). Without this, devs who might need it won't switch, because they don't have a nice fallback for the older versions of Python they support. Consider, they might read the file and still need to launch Python to introspect itself - we ought to be able to avoid that on first release.

If the only thing people are missing is access to the sysconfig data, I am personally okay with requiring them to launch an interpreter during migration. From the cross-builds side, the data can simply be changed by setting _PYTHON_SYSCONFIGDATA_NAME, the key thing for this use-case would be that people stop introspecting other data.

Agreed that replacing sysconfig isn't the goal, but it is a good test for whether we've provided enough information. Right now, the gaps in this area are filled by people using sysconfig and often making assumptions about how Python is normally installed - if we can't replace all of those with this file (plus some runtime calculation, of course), I think we've missed the mark. People shouldn't need to use sysconfig if we get this right, whether they can launch the interpreter or not.

I agree. Though, we shouldn't forget that this file isn't targeting all the use-cases covered by sysconfig (eg. the install layout), for now at least.

And yeah, by "commands" I really just meant the compile-time options required, in some format that a library can figure out which actual options to use with its own compiler. That doc page ought to end up with specific compiler commands as examples, but we wouldn't put those in here. However, there should be enough information in this file that a program/script can figure out how to match the original compiler settings enough to get a compatible extension module. Again, if we can't provide that, I think we've missed our goal, so this is more of a validation test than a specific feature.

That makes more sense, we're on the same page then.

Any proposed API for CPython that wants to replace current usage of sysconfig.get_config_vars as the preferred information source used by build systems, cannot hardcode the name of a compiler to use. If it does hardcode the name of a compiler to use then we (meson, in this case) will simply ignore that new API and keep using sysconfig.get_config_vars with a hacked up heuristic -- our current one to find the correct library / import library is, depending on Unix/windows and also CPython / PyPy, a mixture of templating the hardcoded word python /pypy3-c, the version number, DEBUG_EXT, ABIFLAGS, py_version_nodot, etc.

My goal, as I also stated above, is to provide all the compiler-related information in a compiler-agnostic way. Downstream users, like Meson, should be able to take that information and give it to whichever compiler backend it wants to use and have that be able to generate the actual compiler-specific commands. This should be a 1st party supported use-case.

zooba commented 1 year ago

From the cross-builds side, the data can simply be changed by setting _PYTHON_SYSCONFIGDATA_NAME, the key thing for this use-case would be that people stop introspecting other data.

This doesn't work on Windows - no file is generated. But we still may need to get fields for a non-executable runtime (such as an ARM64 build on an x64 machine).

we shouldn't forget that this file isn't targeting all the use-cases covered by sysconfig (eg. the install layout)

I assume by "install layout" you mean how to install wheels? It ought to cover the locations where CPython itself installed things to, such as its own headers and libs, right? (Those are "compiler-related information," I guess.)

Other than that, agreed with all the rest.

eli-schwartz commented 1 year ago

I assume by "install layout" you mean how to install wheels? It ought to cover the locations where CPython itself installed things to, such as its own headers and libs, right? (Those are "compiler-related information," I guess.)

Well, the install scheme really includes both what wheels typically use as well as a bit more -- it's actually legal to install headers in a wheel, which maps to the same location as the CPython headers in "include" / "platinclude". I don't think that wheels can do anything with the stdlib location though... which is on that

The install layout from sysconfig doesn't currently say anything about the directory where libpython itself is though -- maybe it should? 🤔 People keep on wanting to package up C/++ libraries via tools like auditwheel repair / delvewheel and it's a big pain to do cross platform what with adding the DLL directory, or embedding private paths, having a single conventional directory that python itself guarantees is in the DLL search path for all modules could be handy...

zooba commented 1 year ago

it's actually legal to install headers in a wheel, which maps to the same location as the CPython headers in "include" / "platinclude".

Legal, but unspecified, and not portable. The current include and platinclude in sysconfig are where to find CPython's headers, there's no guarantee you can install anything there.

The plan right now is not to specify those. So we'd include the actual directory where CPython stuff is installed to, and anyone who interprets that as "I can also install stuff here" is going off-label.

FFY00 commented 1 year ago

Okay, to move things forward, let's set some of the base implementation details, so that GH-108483 is unblocked.

I am gonna try to summarize the discussions so far, and what I think to be the most reasonable outcome for each topic.

File location
- @FFY00 proposed stdlib (discussion)
- If I interpret it correctly, @brettcannon seems okay with this (discussion)
- @brettcannon questioned how this would work for other implementations and/or scenarios where this location is not suitable (https://github.com/python/cpython/issues/107956#issuecomment-1690469480)
  - @FFY00 clarified that the location would be an implementation detail, though they'd like to make an effort to keep it consistent when possible (discussion)
- @zooba proposed sys.prefix (discussion)
- @FFY00 noted that this creates a couple issues (discussion)
- Outcome: As discussed above, this should probably go to an installation-specific directory. stdlib was the only proposed location that meets this criterion, and no-one raised an issue against using stdlib specifically, so AFAICT this seems to be the best choice.
- Note: In the future, we can also place the file in another place, if for example we want to improve discoverability (discussion, discussion)
File format
- @FFY00 proposed TOML, but mentioned that JSON would be a suitable alternative (discussion, discussion)
- @brettcannon thinks that if users are meant to ever read or write the file, TOML can make sense, but if not, JSON would be better (discussion)
- @zooba seems to prefer JSON or a simple key=value format to TOML (discussion)
- @AA-Turner seems to agree with @FFY00, but points out that using JSON would make the implementation easier (discussion)
- @zooba mentioned the possibility of using a simple key=value format (discussion)
- @FFY00 doesn't think this is a good option (discussion)
- Outcome: I don't think there is a clear choice here, but TOML and JSON seem to be the stronger options. It seems TOML is the preferred option when looking at the format alone, but it raises concerns regarding the implementation and technical viability. JSON, on the other hand, does not provide any technical challenges, but it's less desirable as a format.
File name (the extension isn't very relevant here, as that depends on the file format)
- @zooba proposed sysconfig.json (discussion)
- Relevant Context: This was at the same time as proposing to use sys.prefix as the file location
- @FFY00 would like a clear separation between the static description file and sysconfig, so doesn't think "sysconfig" is a good option (discussion)
- @FFY00 proposed install-details.toml (discussion)
- Outcome: There hasn't been much discussion about the file name. install-details.toml seems to be the leading option, although the lack of discussion does make it a very strong candidate.

Sorry if missed anything, misunderstood anything, or if there was any bias. Please let me know what you think, and if the proposed outcomes seem correct.

Also, this is my first time writing a discussion summary of this kind, so I am sorry if you feel misrepresented in any way. Please let me know if that is the case, so that I can try to prevent it in the future and try to improve.

zooba commented 1 year ago

It's a good summary. The only bias I feel is there is (apparently/weakly) dismissing my proposals because nobody else is talking about them 😆

Some responses that I started posting on the PR, but make more sense here:

though all major languages I am aware of have a TOML parser library

As far as I know, neither Bash nor PowerShell have anything built in, which means you can't write a native script to handle it. (PowerShell definitely has JSON, and I'm not sure about Bash, but I bet that a command line tool like jq is far more common than... stoml? I don't know what the standard tool would be here.)

.NET certainly has nothing native, though adding additional dependencies is only sometimes an issue. But when it is, writing a basic parser is far more likely than jumping through whatever hoops are necessary to get one.

LTS versions of Python likely to be on existing Linux distros don't have it, which means system scripts or tools on those will need an additional dependency to handle the file. Again, not impossible, but potentially complicated enough that people will reach for str.partition('=') instead and hope.

If we support different data types [in the file], the user implementation [of a parser] wouldn't be trivial

What different types do we need? If certain fields are defined as being int or float, it's easy enough for any language to parse those. But I expect most fields are going to be strings, and the escaping rules for anything other than key=value are going to make editing by hand just as complicated as parsing. The biggest advantage of key=value is that the final string is literally as it's read, and the biggest restriction is that we can't embed newlines trivially.[^1]

[^1]: My vote here would be for an empty key to indicate that the value should be appended after a newline to the previous key. Trailing backslashes are very likely to occur in paths on Windows, so those aren't a good option here IMHO.

eli-schwartz commented 1 year ago

I'd like to raise an additional defense of using json over toml:

@brettcannon thinks that if users are meant to ever read or write the file, TOML can make sense, but if not, JSON would be better (discussion)

It sounds to me like there's no real concern that users are meant to write the file, just read it:

Yes, I expect people to read, and in some situations, write this file, but not too frequently. Some use-cases:

Manually introspecting an installation without running the interpreter, which is the most relevant when working with cross-builds, but certainly not limited to it

This plays a bit with my outer goal of trying to make it possible to build most extensions, etc. without having to run the interpreter, where in certain scenarios, I think users might need to write the file themselves (eg. when it's not available — I expand into this below)

I do hear the rationale for reading the file, mainly for debugging -- in general, data formats that a human can read somehow, are beneficial absent compelling need otherwise. But I'm not sure what the rationale for writing one is.

That there isn't one already existing? Why is this a reason to write one? I would think it's a reason to raise a request for your python binary distributor to add one. I doubt software will be dropping support for running against a python that doesn't have this file, which means old pythons are covered... and new pythons should have the file, right?

...

If it's only interesting to read the file, not to write it, then I still think json is a good choice. The main problem with json is that it's annoying to write correctly (in particular, the requirement to separate elements in a list with commas, but raise a syntax error if the last element has a non-ambiguous trailing comma).

Reading it is mostly easy, you just pretty-print it when originally generating the json file. Optionally, if you want comments, you have to "cheat" and hack those in by creating json entries called "__comment": "this is a comment" which isn't very thrilling, but on the other hand I usually feel like I don't really need comments in a json file except when I am adding notes for the next person to edit the file (assuming that said json file is intended to be edited by hand, instead of generated).

As far as I know, neither Bash nor PowerShell have anything built in, which means you can't write a native script to handle it. (PowerShell definitely has JSON, and I'm not sure about Bash, but I bet that a command line tool like jq is far more common than... stoml? I don't know what the standard tool would be here.)

jq is very common, yes. Every time I've ever wanted to parse a toml file from a shell though, I ended up finding one of like 8 different programs all called "toml2json", then passing that to jq.

AA-Turner commented 1 year ago

@AA-Turner seems to agree with @FFY00, but points out that using JSON would make the implementation easier

I'd be +1 TOML if we had a writer in the stdlib. I'm maybe +0 though currently, as the TOML implementation in the PR seems a little fragile currently. JSON does have the benefits Steve mentioned for PS/bash, etc, though.

encukou commented 1 year ago

FWIW, feature flags that affect the stable ABI are currently:

>>> import _testcapi
>>> _testcapi.get_feature_macros().keys()
dict_keys(['HAVE_FORK', 'MS_WINDOWS', 'PY_HAVE_THREAD_NATIVE_ID', 'Py_REF_DEBUG', 'USE_STACKCHECK'])}