python / cpython

The Python programming language
https://www.python.org
Other
63.47k stars 30.39k forks source link

Introduce platform.vm_info (or similar) as a replacement for recently deprecated platform.java_ver. #116504

Open Stewori opened 8 months ago

Stewori commented 8 months ago

Feature or enhancement

Proposal:

There exist alternative Python implementations that run on a virtual machine (vm) or a comparable middleware. The platform module currently lacks an implementation-independent API to retrieve (version-)information of an underlying vm. Examples are IronPython, Jython (3 possibly one day), RustPython. The proposal is to add a replacement for the recently deprecated function platform.java_ver under a generic name platform.vm_info that can optionally be implemented by a Python implementation. The return value of such a function would be a tuple inspired by what used to be returned by platform.java_ver.

The doc of that function states:

""" Version interface for Jython. 

 Returns a tuple (release, vendor, vminfo, osinfo) with vminfo being 
 a tuple (vm_name, vm_release, vm_vendor) and osinfo being a 
 tuple (os_name, os_version, os_arch). 

 Values which cannot be determined are set to the defaults 
 given as parameters (which all default to ''). 

"""

IMO only the vm_info part should be returned by the propsed function platform.vm_info, hence the name. Os info should be obtainable from the os module, release should be obtainable similar to CPython's release. For vendor I do honestly not understand what the difference to vm_vendor would supposed to be. As a consequence, I suggest the following definition:

def vm_info(vm_name='', vm_release='', vm_vendor=''): 

     """ Version interface for Python implementations on virtual machines. 

         Returns a tuple (vm_name, vm_release, vm_vendor).

         Values which cannot be determined are set to the defaults 
         given as parameters (which all default to '').

     """ 

Apparently, this is plainly the old java_ver refactured to the relevant subset. This definition is merely intended as an entry point for discussion. E.g. I would be fine with a different naming etc. if as a result more use cases can be covered. E.g. I am not sure whether for RustPython the notion of a vm would be accurate, so a broader name may be suggested. Also the parameters vm_name, vm_release and vm_vendor are placed here for discussion. For Java this makes sense because there exists e.g. Java implementations by Oracle and IBM (and many more in fact), which is relevant to know besides the release version number. I am rather confident about the idea that a tuple should be returned and that a plain version number would be an insufficiently narrow information. Perhaps even more fields should be defined, e.g. the build-type of the vm.

As many maintainers of alternative Python implementations as possible should be noticed to take a look at this proposal to make sure it covers as many use cases as possible.

Note:

Most enhancements and bug fixes don’t need a PEP and can be submitted directly to the Python issue tracker. (from: PEP 1)

Given that with java_ver a special case of this proposal has already been part of Python STL for well over a decade, evidence demands the inevitable conclusion that a PEP would be an overkill for this proposal. Even if not, the discussion in this issue would be a necessary prerequisite for a PEP.

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

https://github.com/python/cpython/issues/116349

Stewori commented 8 months ago

@sobolevn @malemburg according to your posts in #116349 I hope this proposal addresses your API concerns. Please feel free to ping relevant devs for this discussion.

sobolevn commented 8 months ago

I would suggest using some other name, not vm, because RustPython doesn't have a vm, it uses rustc compiler 🤔

tooling_version()?

Stewori commented 8 months ago

@slozier and @BCSharp , what is your perspective on this for IronPython (please feel free to ping further IronPython devs)?

Stewori commented 8 months ago

@sobolevn sure, the name "vm" is placed for discussion. Let's collect a couple of ideas.

Stewori commented 8 months ago

@cfbolz @mattip @arigo pinging you just in case there is some point to make from PyPy perspective (not running on a vm, I know).

Stewori commented 8 months ago

@jeff5 @fwierzbicki @jimbaker pinging you to make you aware.

sobolevn commented 8 months ago

@youknowone and @coolreader18 for RustPython

JelleZijlstra commented 8 months ago

I'd caution against using a tuple as the return value, because it will be difficult to change in the future in a backwards-compatible way. Instead I would suggest a simple object with a documented set of attributes (e.g. using types.SimpleNamespace).

Stewori commented 8 months ago

What about a named tuple? What is the advantage of SimpleNamespace over named tuple?

@sobolevn Regarding RustPython, it occurred to me that there is platform.python_compiler(). On Linux I get "GCC [version]" as output from CPython. Perhaps that is better suited to expose a rust compiler version and this proposal would really target vm/middleware info. Just thinking...

JelleZijlstra commented 8 months ago

Namedtuples are still tuples and therefore make it difficult to add new values. You can hack around it by adding a new property that isn't part of the tuple, but that's hacky. For a use case like this the API that tuple provides is mostly not very useful.

sobolevn commented 8 months ago

Related discussion about deprecating 'Java' as platform.system(): https://discuss.python.org/t/lets-deprecate-platform-system-java/48026

Stewori commented 8 months ago

Looking at https://docs.python.org/3/library/types.html#types.SimpleNamespace it appears that elements would be mutable, which should be avoided. The property should have strict read-only character, which may be a reason for using a tuple in the first place. I know, there are techniques to make class attributes read-only but that would complicate the minimalistic intention. Probably the best way is to specify that the function creates a new object every time. (It could actively set the values of a returned singleton to its supposed values. But then, if a user stores the returned object and modifies its attributes they might magically change back later, which can be a nasty side effect.) Or is there an elegant way to have SimpleNamespace with immutable attributes? Another advantage of tuple is that print(platform.vm_info) would directly produce something useful. SimpleNamespace appears to provide only a __repr__ implementation, so a proper __str__ would need to be added as well (AFAIK the default __str__ would not provide attribute values). That's certainly doable, but again makes it a little more complicated.

youknowone commented 8 months ago

Can platform refer another implementation-depend either c-written or python-written module _platform(or _python, _variant or any _something_good_name) and put everything CPython-specific literals in it? Then every other implementation only can write its own _platform. Here are the places RustPython hard-code its name:

Making interface for them will be easier once we gather those variants into a single spot.

Stewori commented 8 months ago

That would demand a bigger change in CPython than was intended with this issue (would probably be hard to convince them for such a change). That said, providing a custom version of platform is what also Jython is doing. You may be right that the path to a proper implementation-independent interface would be to collect and consider all variants but that is well beyond the ambition of this feature request.

sobolevn commented 8 months ago

Btw, I added several skips on our side for test_cmd_line in https://github.com/python/cpython/pull/116859

slozier commented 8 months ago

From the IronPython perspective, the .NET runtime doesn't really provide a great way to get the information required to populate this so I'm not sure if we'd be able to make use of it. Also trying hard to keep changes to the standard library to a minimum, so pulling the info from an implementation-dependant module would be preferable to modifying platform.py.

Stewori commented 8 months ago

pulling the info from an implementation-dependant module would be preferable to modifying platform.py.

This is just about interfacing. Adding a hook that alternative Python implementations can implement if it makes sense. Only a minimal, almost empty (that is, returning a dummy value) function header would be added, so users can access the info within official Python API. The main purpose would be to define the API and probably to host the API doc and spec. I never worked with IronPython, but I suppose it should not be too difficult to insert a workable custom implementation for that API then, or to monkeypatch the platform module accordingly.

the .NET runtime doesn't really provide a great way to get the information required to populate this

It surprises me that the .net runtime would not expose its version number and some info. It is not possible for a .net application to know (without dirty tricks, perhaps) whether it is running on Mono or Microsoft .net? What about the RuntimeInformation.FrameworkDescription Property? On stack overflow they say, it would also report e.g. "Mono [Version]". Is access to that API not feasible in IronPython?

slozier commented 8 months ago

Only a minimal, almost empty (that is, returning a dummy value) function header would be added, so users can access the info within official Python API.

Right, but implementation does matter (I'm not proposing either of these, they're just serving as examples). If this simply adds

def vm_info(vm_name='', vm_release='', vm_vendor=''):
    return vm_name, vm_release, vm_vendor

to platform.py then we have to modify platform.py and ship our own (yes I know we already do, but if we didn't have to we wouldn't). Whereas if it were implemented as:

def vm_info(vm_name='', vm_release='', vm_vendor=''):
    try:
        from _platform import vm_info
        return vm_info(vm_name=vm_name, vm_release=vm_release, vm_vendor=vm_vendor)
    except ImportError:
        return vm_name, vm_release, vm_vendor

then we could implement it without touching the Python part of standard library.

Side note, I'm not familiar with how this is used with Java, but what purpose do the function arguments serve? Why would you pass in your own values?

What about the RuntimeInformation.FrameworkDescription Property? Is access to that API not feasible in IronPython?

I am aware of that API and we expose it via clr.FrameworkDescription. However, it is meant to be a diagnostic string and does not provide any guarantees as to its form so I'm not particularly interested in trying to parse it to split the runtime and version information.

Anyway, don't let IronPython hold you back, we're far enough behind that by the time we get to whatever version implements this .NET might have proper APIs. 😄

sobolevn commented 7 months ago

What do you think of this API:

def implementation_info():
    try:
        import _implementation_platform
    except ImportError:
        return None
    else:
        return _implementation_platform.implementation_info()

Design:

jeff5 commented 7 months ago

@Stewori : Python maintains support for the possibility of multiple implementations through sys.implementation. IMO, once one knows the implementation, one may find details specific to the implementation by an implementation-specific path. E.g knowing it is Jython one can use System.getProperty. The set of properties is large and subtle.

>>> System.getProperty('java.version')
u'1.8.0_321'
>>> System.getProperty('java.specification.version')
u'1.8'
>>> System.getProperty('java.vm.specification.version')
u'1.8'
>>> System.getProperty('java.vm.version')
u'25.321-b07'

I think I am most likely to want the java.specification.version, so that I can know what libraries to expect, and whether Jigsaw is in play, but what use case have you in mind?

Perhaps GraalVM cannot do exactly this, but that implementation could have its own access to the properties that applications need to know.

Edit: I think I am basically suggesting that there may not be a sufficiently uniform idea of the VM information, for the standard library to offer a uniform API to it.

Stewori commented 7 months ago

@jeff5 By that logic, it would be enough to expose os.name and once one knows the os one can use os-specific modules and measures to identify the info the platform module provides. No need for a platform module in the first place. I think the spirit of the platform module is to standardize platform info accross different platforms and implementations and some moderate info about a possibly underlying middleware would be a justified part of that. In other words, the platform module was made for that kind of info, so for consistency everyone should actually put it there. This proposal is just one humble step towards a better standardization accross Python implementations.

there may not be a sufficiently uniform idea of the VM information

I thought that a name and version would not be asked too much and that it would be the minimal kind of information every framework would provide. (I added vendor mainly because it was in java_ver, it's probably not so important). Since this is apparently not even feasible on .net, what about a single info String? The semantics of the info would be recommended as "[name] [version]" but that would not be a strict rule. IronPython would provide whatever the content of the FrameworkDescription property is, for Jython we would concat system properties we think are suitable and other implementations may follow this pattern to their liking. The doc in the dummy/interface implementation in CPython may feature an explicitly incomplete list of known example values. Returning just a simple string would also eliminate the discussion and complexity of what container to return (tuple vs object with read-only properties). Overall I intended this feature to be simple, so it would not introduce a maintenance burden. (I know, I initially argued against a string value, but that was before the discussion and feedback.)

@slozier If IronPython already exposes the info as clr.FrameworkDescription, like you say, would it be so hard to support this feature by exposing it in the platform module? Given that you already ship a custom module (like Jython does). Could also be done as a monkey patch during startup, whatever works best.

@sobolevn, @youknowone What info do you think would RustPython expose here that would not fit better into some other already existing platform property? E.g. into platform.python_compiler()? The semantics of that property is to name the compiler that was used to build the currently running Python interpreter, e.g. on linux one gets "GCC [version]". The value "Rust [version]" would fit into that semantics for RustPython, so it appears to me. If there is a good reason not to place it there I am open to renaming this property, e.g. to framework_info (I somehow find tooling_info not a sufficient fit to refer to middleware).

Stewori commented 7 months ago

Placing this draft for discussion:

def framework_info():
    '''Returns a string describing a virtual machine, middleware or similar
    kind of framework the current Python implementation is running on.

    Since CPython is not running on any such framework, the reference
    implementation just returns `None`. Alternative implementations may
    expose a framework description via this method in a standardized API.

    The recommended format of the string is "[name] [version]", wherein the
    version part should not contain spaces. Other formats are not forbidden,
    just discouraged. A good reason to diverge from the standard format is if
    the middleware provides the required information in a cumbersome way
    and overly complicated parsing would be required to adjust the format.
    E.g. a browser-based Python implementation might most favorably provide
    the original user agent string.

    :returns:
        Description, likely name and version, of a virtual machine, middleware
        or similar kind of framework the current Python implementation is
        running on, `None` for CPython.
    :rtype: Optional[str]

    Anticipated implementations:

    :IronPython:
        content of the property 
       `RuntimeInformation.FrameworkDescription`
    :Jython, GraalPython: 
        `System.getProperty('java.vm.name') + " " + 
        System.getProperty('java.version')`
    :Brython: user agent string `navigator.userAgent`

    Anticipated example outputs:

    :Java 21 on OpenJDK: `OpenJDK 64-Bit Server VM 21`
    :Java 8 on OpenJDK: `OpenJDK 64-Bit Server VM 1.8.0_292`
    :Java 8 on J9: `IBM J9 VM 1.8`
    :.NET 3: `.NET Core 3.1.32`
    :.NET 7: `.NET 7.0.12`
    '''
    try:
        import _platform
    except ImportError:
        return None
    else:
        return _platform.framework_info()
jeff5 commented 7 months ago

By that logic, it would be enough to expose os.name and once one knows the os one can use os-specific modules and measures to identify the info the platform module provides. No need for a platform module in the first place.

I don't think this analogy holds because Python does not expose to us analogous things to those we are discussing, e.g. version when os.name indicates Windows, since uname is not guaranteed provided, according to the docs.

My instinct is for a tuple (pair), but show it used in a plausible application and it will be clearer why these items are the correct choice and the form.

Stewori commented 7 months ago

What I'm saying is that the platform module provides easy and standardized access to platform and machine information, even things like platform.processor().

Python does not expose to us analogous things to those we are discussing, e.g. version when os.name indicates Windows, since uname is not guaranteed provided, according to the doc

I'm sure there are existing system library calls (e.g. via ctypes) via which one could get that information in platform-specific ways (platform module itself somehow gets the info). Of course that would be much more complicated and require system knowledge. So why should the users go through that hassle to get info about an underlying vm? Is that info less relevant? Then, what relevance does the processor string have. I suppose the use case is diagnostics but who knows? Apparently there was a use case to introduce java_ver and framework_info would just provide similar info (less in fact).

uname is not guaranteed provided

I think the same applies for various properties of the platform module.

Do you mean by "tuple" (name, version)? That already seems to require parsing (and maybe even guesswork) for middleware that does not expose as plenty system properties as Java does.

Reading the platform doc, I notice that an empty string seems to be the preferred result if info is not available. So the above draft should probably return the empty string instead of None.

youknowone commented 7 months ago

What info do you think would RustPython expose here that would not fit better into some other already existing platform property? E.g. into platform.python_compiler()?

This seems better fit for compiler case. By looking the thread, implementations running on VM needs more information than compiled to native one.

jeff5 commented 7 months ago

Apparently there was a use case to introduce java_ver

I found a few uses of java_ver (on one of those scraping sites), but those were mostly (all?) scrapes of files now missing.

Do you mean by "tuple" (name, version)? That already seems to require parsing (and maybe even guesswork) for middleware that does not expose as plenty system properties as Java does.

That's right. Making it requires a split operation for IronPython. Making the string requires a join from a Java implementation. If the user always has to pick the string apart then they must understand to split on the rightmost space. My conjecture that a tuple would be more convenient stands or falls by how the result is to be used.

Stewori commented 7 months ago

A tuple has been criticized earlier in this thread; a simple object with read-only attributes seems to be preferred for better backwards compatibility if more fields should be introduced. However, that seems overly complicated to me - the intention was simplistic.

they must understand to split on the rightmost space

I thought this field might also be populated by Python running in the browser (Brython, e.g.). I looked up browser/Jvascript/html5 API and it seems that the relevant info is only exposed as the user agent string. That string is a different kind of beast - it may also contain rendering engine/version, perhaps also system/version. There are some attempts to parse it but they are complicated and need adjustment every some years. So I thought it might be best to expose an info string unaltered if a middleware exposes only a single string. That seems to be a typical case (.net, browsers). Then users can make of it what they want. For cases like Java where there are plenty of info-properties, the convention "name version" is proposed. That combination seems to me most flexible for unknown further cases of middleware, so the API would not require adjustment in the future. Also, the idea of providing a simple string fits well with most functions in the platform module (IIRC), perhaps with the exception of uname.