python / cpython

The Python programming language
https://www.python.org
Other
63.55k stars 30.45k forks source link

email.utils.getaddresses improper parsing of unicode realnames #86953

Open 4c8f15b4-bd1a-40e0-8a87-5743fc1f877b opened 3 years ago

4c8f15b4-bd1a-40e0-8a87-5743fc1f877b commented 3 years ago
BPO 42787
Nosy @warsaw, @bitdancer, @rrhodes

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['type-feature', 'expert-email', '3.9'] title = 'email.utils.getaddresses improper parsing of unicode realnames' updated_at = user = 'https://bugs.python.org/konstantin2' ``` bugs.python.org fields: ```python activity = actor = 'trrhodes' assignee = 'none' closed = False closed_date = None closer = None components = ['email'] creation = creator = 'konstantin2' dependencies = [] files = [] hgrepos = [] issue_num = 42787 keywords = [] message_count = 2.0 messages = ['384069', '384071'] nosy_count = 4.0 nosy_names = ['barry', 'r.david.murray', 'trrhodes', 'konstantin2'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue42787' versions = ['Python 3.9'] ```

4c8f15b4-bd1a-40e0-8a87-5743fc1f877b commented 3 years ago

What it currently does:

>>> import email.utils
>>> email.utils.getaddresses(['Shuming [范書銘] <shumingf@realtek.com>'])
[('', 'Shuming'), ('', ''), ('', '范書銘'), ('', ''), ('', 'shumingf@realtek.com')]

What it should do:

>>> import email.utils
>>> email.utils.getaddresses(['Shuming [范書銘] <shumingf@realtek.com>'])
[('Shuming [范書銘]'), 'shumingf@realtek.com')]
46a103db-7c4a-487c-8aa3-9d788608ff64 commented 3 years ago

Hi Konstantin,

Thanks for raising this issue. It appears the field provided in your example does not conform to RFC 2822 followed by this email library. Square brackets are treated as special characters in section 3.2.1, which is handled in the _parseaddr file.

The above combined with the fact that any failed parsing returns an two-tuple of ('', '') I believe explains the behavior observed.