python / cpython

The Python programming language
https://www.python.org
Other
63.19k stars 30.26k forks source link

extend `sys` module on windows #102536

Open maxbachmann opened 1 year ago

maxbachmann commented 1 year ago

It would be useful to be able to detect: 1) the platform as 32-bit vs 64-bit vs ARM64. 2) the windows api partition

@zooba you mentioned adding them to sys.implementation. Since they are only available on windows, I assume this would mean adding them with an underscore prefix? The alternative would be a separate API similar to sys.winver.

This is a follow up to #102256.

Linked PRs

zooba commented 1 year ago

Yeah, I don't think we want to bother with making them required attributes, so I'd add:

ericsnowcurrently commented 4 weeks ago

Is sys.implementation the right place? It should have things related only to the Python implementation, that other implementations also have. The architecture is a build characteristic and the api partitions seem like it as well. Where do we normally record build or platform (or OS) information?

zooba commented 4 weeks ago

The only real alternative is at the top level of the sys module. Maybe you could make a case for it going into sysconfig, but we still need an attribute on a native module somewhere, as it's the only way we're getting the field it at build time.

I assume this hasn't come up before because Windows is the only OS with transparent CPU emulation? So querying the current OS architecture isn't sufficient to determine the current process architecture.

ericsnowcurrently commented 4 weeks ago

sys._build.*?

zooba commented 4 weeks ago

Really it should be sys.platform_but_for_real_this_time as we can't just go updating sys.platform any more.

What if we added sys.architecture for all platforms? platform.architecture(sys.executable) is too limited in its return values to work here, but it ought to be known at compile time, right? And something like Jython or IronPython can use clr or jvm if they prefer. Maybe they've already got an additional field for that?

vstinner commented 4 weeks ago

the platform as 32-bit vs 64-bit vs ARM64.

There are multiple existing APIs to do that. Examples:

>>> platform.architecture()
('64bit', 'ELF')
>>> os.uname().machine
'x86_64'

I don't think that we should add information retrieved at runtime in sys. Usually, sys is more information computed during the build.

zooba commented 4 weeks ago

platform.architecture() doesn't distinguish between ARM64 and x64, and os.uname().machine doesn't reflect how CPython was compiled. So we need to store the target architecture at build time.

But you're correct that we don't need to retrieve it at runtime. (Though worth noting that a lot of sys is dynamic - path, modules, various hooks, etc.)

rruuaanng commented 1 week ago

Hmmm. Actually, the discussion in the PR failed because no one decided whether to merge it or not (it seems that the only supporter is @zooba). The newly added arch is used to get the current build architecture in Windows (it comes from python.props). What do you think?

link: https://github.com/python/cpython/pull/124582 https://discuss.python.org/t/regarding-whether-we-should-add-py-currentarch-or-py-archname-function/65191/17

ericsnowcurrently commented 1 week ago

My main concern was/is that this information doesn't belong on sys.implementation. That's meant for information about the Python implementation (e.g. cpython, pypy, jython, micropython), not about the build. See PEP 421.

Something like sys._architecture, sys.build.architecture, or platform.??? or on sysconfig (like @zooba suggested) would be more appropriate. (FWIW, I like the idea of sys.build.*, PEP-421-style, but its hypothetical contents might already be covered elsewhere.)

All that said, I haven't looked closely at what we're trying to actually accomplish here, so I don't have much feedback on that.

zooba commented 1 week ago

I'm often the only supporter of things involving Windows :-) Usually aggregated from a lot of discussions with a range of users, but not easily cited (many Windows users are scared off by the culture here, so they won't report stuff themselves).

In this case, we've been looking at adding support to project's build scripts to allow building on Windows ARM64, which requires the builder to know whether it should be targeted ARM64 or not. That's usually done by running in the ARM64 interpreter, and then to cross-compile you'd run in the x64 or x86 interpreter, but we don't currently have a reliable way to detect that. Historically we haven't needed it, because sys.maxsize (or sys.maxint) was sufficient to detect 64-bit vs 32-bit. But that can't tell the difference between ARM64 and AMD64.

More importantly, and relevant for this case, is that platform.architecture() can't return the difference, even if it could detect it, because it's defined as only returning "64" or "32". And PEP 421 explicitly suggests sys.implementation is the place to store information that would be returned from platform about the Python implementation. The architecture the current runtime was compiled for sure seems to be related to the implementation, and since platform would be the obvious place to return such information (as it already is, albeit insufficiently), adding the field to sys.implementation seems fine.

Having now re-read that PEP though,[^1] I think we should drop the underscore. Make it sys.implementation.architecture and if it exists then platform.architecture(sys.executable) can return it rather than querying the actual binary (if it's within the range of allowable return values, and perhaps we can later add platform.architecture_but_for_real_this_time())

[^1]: Specifically Non-required Attributes and Use Cases, the platform and Jython use cases being most relevant.

vstinner commented 1 week ago

platform.machine() is another possible place for such information. It already has Windows specific code to detect the architecture.

erlend-aasland commented 1 week ago

platform.processor() also seems like a good candidate. There's also a handful of platform.win32_*() APIs. Perhaps a similar one could be added. I agree with Eric and others that sys.implementation seems unfit for this API.

zooba commented 1 week ago

platform.machine() and platform.processor() are for different purposes. We want to know the compile-time target of the runtime, which is unrelated to the current machine type (Windows has emulation on most architectures, so the machine could be ARM64 but the runtime is x86 and there's currently no way to determine this).

rruuaanng commented 1 week ago

I agree. If I remember correctly, the win kernel is actually simulating multiple subsystems at the same time (such as win32. It's used to be compatible with historical calls). Maybe when building, it's in win32, but in fact it's in amd64. Maybe it‘s in arm64. These can only be obtained in the initial state at build time.

ZeroIntensity commented 1 week ago

OK, I think I understand Steve's point of view now. There's precedent for this in the stdlib: we're using sys.implementation in platform here: https://github.com/python/cpython/blob/10c4c95395771fb37e93811aaace42f026c16de5/Lib/platform.py#L889

I'm only comfortable with sys.implementation._architecture if it's an internal thing used only for platform, instead of a public interface--then, it really is an implementation detail, and sort of fits there.

Alternatively, we could add a public sys.architecture that also supports other systems, rather than just Windows. It doesn't seem like it would be all that difficult to just define ARCH_NAME as something else for Linux builds.

rruuaanng commented 1 week ago

I think that for sys.architecture, a new PR can be submitted after the _architecture attribute is added. This issue may be put on hold for now. In fact, if possible, the value obtained by sys.architecture should be the same as the _architecture attribute.

ZeroIntensity commented 1 week ago

If we go for sys.architecture, then there's no need for an _architecture attribute at all.

rruuaanng commented 1 week ago

As far as the discussion is concerned, it seems that only the _architecture attribute is what we need, because other platforms are not actually as complicated as windows.

zooba commented 1 week ago

My main hesitation with defining it as public sys.architecture is that now we have to figure out sensible values for all the other platforms as well... what do we put for WASI? How do we under-specify it enough that we can expand to new platforms later on and not run into platform.architecture type issues (or sys.platform type issues)?

erlend-aasland commented 1 week ago

Platforms that use our configure build system (that is, most platforms) can retrieve compile time stuff via sysconfig APIs.

vstinner commented 4 days ago

The sysconfig module sounds like a good place to add the build-time architecture on Windows.

zooba commented 4 days ago

The sysconfig module doesn't have a native component, and we don't have a compile-time generated file that ships with the runtime (yet). sysconfig retrieves compile-time stuff on Linux because we include a copy of Makefile, and it scans it.

Storing a single value somewhere in a native module is much easier on Windows than adding an entirely new static file that is generated at compile time. The question is about where we should store the value, more than where it gets retrieved from.

zware commented 3 days ago

The sysconfig module doesn't have a native component, and we don't have a compile-time generated file that ships with the runtime (yet). sysconfig retrieves compile-time stuff on Linux because we include a copy of Makefile, and it scans it.

Actually, I was just surprised to learn that it looks like a _sysconfig module has snuck in via gh-88402/GH-110049[^1].

[^1]: And to be clear, I consider that a good thing :). More of sysconfig/_sysconfigdata should probably make its way there eventually.

FFY00 commented 2 days ago

IMO, this should be implemented by adding MULTIARCH — which represents a target triple — in sys.get_config_vars() on windows, while we don't have a better API. I don't think sys.implementation is the right place, this is a build detail, not an implementation-specific detail.

zooba commented 2 days ago

I was just surprised to learn that it looks like a _sysconfig module has snuck in

Ah, so it has.

this should be implemented by adding MULTIARCH — which represents a target triple

Given that we can embed it at compile time, I'm fine with this. Do we have official target triples for Windows? Or do we just borrow them from someone else (I'm pretty sure Rust uses them, yeah?)

rruuaanng commented 15 hours ago

I actually have no problem with adding the arch member to MULTIARCH, but would that be better than adding it to the implementation? For example, in terms of declarative?

@zooba, what do you think about this?

zooba commented 14 hours ago

would that be better than adding it to the implementation? For example, in terms of declarative?

It would be added to the Modules\_sysconfig.c file (which I'd forgotten about), so it's easy enough to add. It's a bit rough on users, but at least it'll be possible to get at the information.

We need a spec for MULTIARCH though, to define exactly what the values should look like.

rruuaanng commented 14 hours ago

That is, it should look like this

>>> import _sysconfig
>>> _sysconfig.config_vars()
{'EXT_SUFFIX': '.cp314-win_amd64.pyd', 'SOABI': 'cp314-win_amd64', 'Py_GIL_DISABLED': 0, 'ARCH': 'win32'}

Edit (or maybe I didn't output the value correctly).