python / cpython

The Python programming language
https://www.python.org
Other
61.85k stars 29.73k forks source link

xid_start definition for Unicode identifiers refers to xid_continue #74314

Open 54bf8cd3-fa5f-46be-85f2-1de8e23f9bb5 opened 7 years ago

54bf8cd3-fa5f-46be-85f2-1de8e23f9bb5 commented 7 years ago
BPO 30128
Nosy @loewis, @zhangyangyu

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['docs'] title = 'xid_start definition for Unicode identifiers refers to xid_continue' updated_at = user = 'https://bugs.python.org/ralphcorderoy' ``` bugs.python.org fields: ```python activity = actor = 'xiang.zhang' assignee = 'docs@python' closed = False closed_date = None closer = None components = ['Documentation'] creation = creator = 'ralph.corderoy' dependencies = [] files = [] hgrepos = [] issue_num = 30128 keywords = [] message_count = 2.0 messages = ['292049', '292592'] nosy_count = 4.0 nosy_names = ['loewis', 'ralph.corderoy', 'docs@python', 'xiang.zhang'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = None url = 'https://bugs.python.org/issue30128' versions = ['Python 3.6'] ```

54bf8cd3-fa5f-46be-85f2-1de8e23f9bb5 commented 7 years ago

https://docs.python.org/3/reference/lexical_analysis.html#identifiers has a grammar.

identifier   ::=  xid_start xid_continue*
id_start     ::=  <all characters in general categories Lu, Ll, Lt, Lm, Lo, Nl, the underscore, and characters with the Other_ID_Start property>
id_continue  ::=  <all characters in id_start, plus characters in the categories Mn, Mc, Nd, Pc and others with the Other_ID_Continue property>
xid_start    ::=  <all characters in id_start whose NFKC normalization is in "id_start xid_continue*">
xid_continue ::=  <all characters in id_continue whose NFKC normalization is in "id_continue*">

I struggle to make sense of it unless I remove xid_continue*' fromxid_start's definition. I suspect it ended up there due to cut and paste.

zhangyangyu commented 7 years ago

Quoting from PEP-3131:

XID_Start then closes this set under normalization, by removing all characters whose NFKC normalization is not of the form ID_Start ID_Continue* anymore.