python / cpython

The Python programming language
https://www.python.org
Other
63.11k stars 30.22k forks source link

email: ValueError in get_section when parsing header with non-ASCII digit #87112

Open b8c17aa0-8543-473e-baae-23d368fad117 opened 3 years ago

b8c17aa0-8543-473e-baae-23d368fad117 commented 3 years ago
BPO 42946
Nosy @warsaw, @bitdancer, @maxking, @The-Compiler

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['type-bug', '3.8', '3.9', '3.10', '3.7', 'library'] title = 'email: ValueError in get_section when parsing header with non-ASCII digit' updated_at = user = 'https://github.com/The-Compiler' ``` bugs.python.org fields: ```python activity = actor = 'The Compiler' assignee = 'none' closed = False closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 'The Compiler' dependencies = [] files = [] hgrepos = [] issue_num = 42946 keywords = [] message_count = 1.0 messages = ['385162'] nosy_count = 4.0 nosy_names = ['barry', 'r.david.murray', 'maxking', 'The Compiler'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue42946' versions = ['Python 3.6', 'Python 3.7', 'Python 3.8', 'Python 3.9', 'Python 3.10'] ```

b8c17aa0-8543-473e-baae-23d368fad117 commented 3 years ago

Found mostly by accident:

>>> import email.headerregistry
>>> reg = email.headerregistry.HeaderRegistry()
>>> h = reg('Content-Disposition', 'inline; 0*²')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.10/email/headerregistry.py", line 608, in __call__
    return self[name](name, value)
  File "/usr/lib/python3.10/email/headerregistry.py", line 196, in __new__
    cls.parse(value, kwds)
  File "/usr/lib/python3.10/email/headerregistry.py", line 452, in parse
    kwds['parse_tree'] = parse_tree = cls.value_parser(value)
  File "/usr/lib/python3.10/email/_header_value_parser.py", line 2705, in parse_content_disposition_header
    disp_header.append(parse_mime_parameters(value[1:]))
  File "/usr/lib/python3.10/email/_header_value_parser.py", line 2569, in parse_mime_parameters
    token, value = get_parameter(value)
  File "/usr/lib/python3.10/email/_header_value_parser.py", line 2431, in get_parameter
    token, value = get_section(value)
  File "/usr/lib/python3.10/email/_header_value_parser.py", line 2384, in get_section
    section.number = int(digits)
ValueError: invalid literal for int() with base 10: '²'

This probably happens because:

>>> '²'.isdigit()
True
>>> int('²')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '²'