python / cpython

The Python programming language
https://www.python.org
Other
63.08k stars 30.21k forks source link

`sys.flags.gil` should be a "sequence" attribute #122575

Open colesbury opened 2 months ago

colesbury commented 2 months ago

Bug report

sys.flags is a PyStructSequence. PyStructSequence is similar to a named tuple, but it can have attributes that are not part of the sequence.

Currently, sys.flags.gil is not a "sequence" attribute:

>>> import sys
>>> sys.flags
sys.flags(debug=0, inspect=0, interactive=0, optimize=0, dont_write_bytecode=0, no_user_site=0, no_site=0, ignore_environment=0, verbose=0, bytes_warning=0, quiet=0, hash_randomization=1, isolated=0, dev_mode=False, utf8_mode=0, warn_default_encoding=0, safe_path=False, int_max_str_digits=4300)

I think this was an oversight. We forgot to update the n_in_sequence field from 18 to 19 in gh-116338:

https://github.com/python/cpython/blob/fda6bd842a2b93a501526f1b830eb900d935ac73/Python/sysmodule.c#L3123-L3128

Linked PRs

encukou commented 2 months ago

Generally, changing n_in_sequence is a backwards-incompatible change: code that unpacks the tuple will break. With 18 elements, and after int_max_str_digits was added in a bugfix release, I guess no one unpacks sys.flags specifically. Still, it might be better to change PyStructSequence.__repr__ than to allow sys.flags[18]?

colesbury commented 2 months ago

Generally, changing n_in_sequence is a backwards-incompatible change...

That seems like an entirely hypothetical concern. As you wrote, I don't think anyone unpacks the 18 or 19 elements of sys.flags. The longstanding practice is to include added elements. The last time this we did this was when we added int_max_str_digits in 3.12, which not only was included as a positional element, but backported all the way back to 3.7:

This is the only PyStructSequence that has a non-sequence element, and the omission was an oversight, not intentional.

Still, it might be better to change PyStructSequence.__repr__

I don't think so. These are publicly documented as "named tuples" and named tuples do not support this feature of attributes that are only accessible by name.

encukou commented 2 months ago

Here's what I think. Perhaps this needs wider discussion.

This is the only PyStructSequence that has a non-sequence element

That's not the case:

>>> import os
>>> os.stat('.').n_fields
19
>>> os.stat('.').n_sequence_fields
10

The last time this we did this was when we added int_max_str_digits in 3.12, which not only was included as a positional element, but backported all the way back to 3.7

Sadly, yes. The security fix didn't go through public review. As far as I can tell, the next-to-last time we did this was in 2019. In my view, both of those were oversights.

I don't think so. These are publicly documented as "named tuples" and named tuples do not support this feature of attributes that are only accessible by name.

Note that the whole concept name-only attributes was added to PyStructSequence specifically in order to not break unpacking, back when PyStructSequence was added to replace real tuples: https://github.com/python/cpython/issues/35189#issuecomment-1093963218 (If it wasn't for backwards compatibility with older tuple API, structseqs would probably just be “structs”; not iterable at all.)

Sadly, the user-facing documentation for creating these types doesn't mention how they should be used in stdlib.

colesbury commented 2 months ago

That's not the case ...

Thanks, I misinterpreted what I saw in gh-122577 and missed that.

As far as I can tell, the next-to-last time we did this was https://github.com/python/cpython/pull/13488

We've added flags over a dozen times before that. Every time we've updated n_in_sequence or fixed it up in a subsequent commit (e.g., 90e8f8cd9b18fd18581a1ee01664f50852fe68f8)

If you want to implement this differently, that's fine with me, but I think it's a waste of time.

vstinner commented 1 month ago

Generally, changing n_in_sequence is a backwards-incompatible change: code that unpacks the tuple will break.

Over the last years, many flags were added to sys.flags and I don't recall a single bug report.

I'm convinced that nobody is unpacking sys.flags. It sounds boring to write flag1, flag2, flag3, ..., flag18 = sys.flags, while you can just write value = sys.flags[12].

encukou commented 3 days ago

Unfortunately I haven't had time to dig deeper. Don't take my concern too seriously.

vstinner commented 3 days ago

Python 3.13.0 is now shipped without sys.flags[18]. IMO it's now too late to change it.

$ ./python
Python 3.13.0+ (heads/3.13:4eab6e8d296, Oct  8 2024, 13:44:17) [GCC 14.2.1 20240912 (Red Hat 14.2.1-3)] on linux
>>> import sys
>>> sys.flags
sys.flags(..., int_max_str_digits=4300)
>>> len(sys.flags)
18
>>> sys.flags[17]  # int_max_str_digits
4300

>>> sys.flags[18]
IndexError: tuple index out of range
>>> sys.flags.gil
1
vstinner commented 3 days ago

IMO we should deprecate the tuple API for sys.flags. It sounds really bad to rely on an exact flag "index" to get a flag, instead of using its name.

Good example:

>>> sys.flags.safe_path
False

Bad example:

>>> sys.flags[16]
False