python / cpython

The Python programming language
https://www.python.org
Other
63.39k stars 30.36k forks source link

Python 3.9 regression: Literal dict with > 65535 items are one item shorter #85703

Closed bc75918c-a209-4fa3-b6cf-28cfb7317f76 closed 4 years ago

bc75918c-a209-4fa3-b6cf-28cfb7317f76 commented 4 years ago
BPO 41531
Nosy @ericvsmith, @markshannon, @serhiy-storchaka, @hroncok, @pablogsal, @miss-islington, @tirkarthi
PRs
  • python/cpython#21850
  • python/cpython#21853
  • python/cpython#22105
  • python/cpython#22107
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['interpreter-core', 'type-bug', '3.9', '3.10', 'release-blocker'] title = 'Python 3.9 regression: Literal dict with > 65535 items are one item shorter' updated_at = user = 'https://github.com/hroncok' ``` bugs.python.org fields: ```python activity = actor = 'pablogsal' assignee = 'none' closed = True closed_date = closer = 'pablogsal' components = ['Interpreter Core'] creation = creator = 'hroncok' dependencies = [] files = [] hgrepos = [] issue_num = 41531 keywords = ['patch', '3.9regression'] message_count = 9.0 messages = ['375255', '375256', '375257', '375259', '375275', '375295', '375296', '375297', '376417'] nosy_count = 10.0 nosy_names = ['eric.smith', 'mrabarnett', 'zbysz', 'Mark.Shannon', 'serhiy.storchaka', 'hroncok', 'pablogsal', 'miss-islington', 'xtreak', 'batuhanosmantaskaya'] pr_nums = ['21850', '21853', '22105', '22107'] priority = 'release blocker' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue41531' versions = ['Python 3.9', 'Python 3.10'] ```

    bc75918c-a209-4fa3-b6cf-28cfb7317f76 commented 4 years ago

    Consider this reproducer.py:

    import sys
    LEN = int(sys.argv[1])
    
    with open('big_dict.py', 'w') as f:
        print('INTS = {', file=f)
        for i in range(LEN):
            print(f'    {i}: None,', file=f)
        print('}', file=f)
    
    import big_dict
    assert len(big_dict.INTS) == LEN, len(big_dict.INTS)

    And run it with any number > 65535:

    $ python3.9 reproducer.py 65536
    Traceback (most recent call last):
      File "/tmp/reproducer.py", line 12, in <module>
        assert len(big_dict.INTS) == LEN, len(big_dict.INTS)
    AssertionError: 65535

    This has not happened on python 3.8. This also happens with PYTHONOLDPARSER=1.

    08f81f08-11aa-4faa-851c-2c653ec329f5 commented 4 years ago

    Also reproduces with today's git.

    bc75918c-a209-4fa3-b6cf-28cfb7317f76 commented 4 years ago

    It appears that the 65535 key is missing regardless of the LEN value.

    08f81f08-11aa-4faa-851c-2c653ec329f5 commented 4 years ago

    Bisect says 8a4cd700a7426341c2074a2b580306d2d60ec839 is the first bad commit. Considering that 0xFFFF appears a few times in that patch, that seems plausible ;)

    39d85a87-36ea-41b2-b2bb-2be43abb500e commented 4 years ago

    I think what's happening is that in 'compiler_dict' (Python/compile.c), it's checking whether 'elements' has reached a maximum (0xFFFF). However, it's not doing this after incrementing; instead, it's checking before incrementing and resetting 'elements' to 0 when it should be resetting to 1. The 65535th element isn't counted.

    markshannon commented 4 years ago

    New changeset c51db0ea40ddabaf5f771ea633b37fcf4c90a495 by Pablo Galindo in branch 'master': bpo-41531: Fix compilation of dict literals with more than 0xFFFF elements (GH-21850) https://github.com/python/cpython/commit/c51db0ea40ddabaf5f771ea633b37fcf4c90a495

    markshannon commented 4 years ago

    @hroncok,

    How did you discover this issue?

    I'd like to clean up the code for creating dictionary literals and it might be helpful to know where such huge dictionary literals exist. I'm guessing that they are used as lookup tables for things like Unicode code-point tables, and that they would only include constants.

    tirkarthi commented 4 years ago

    @hroncok said on Twitter it was reported at https://github.com/Storyyeller/enjarify/issues/17

    pablogsal commented 4 years ago

    New changeset d64d78be20ced6ac9de58e91e69eaba184e36e9b by Miss Islington (bot) in branch '3.9': bpo-41531: Fix compilation of dict literals with more than 0xFFFF elements (GH-21850) (GH-22107) https://github.com/python/cpython/commit/d64d78be20ced6ac9de58e91e69eaba184e36e9b