Closed ghost closed 5 years ago
This is very easy issue.
丗 meanning is 30.(丗 is 0x4E17) "丗".isnumeric() must returns true. but "丗".isnumeric() returns False.
We use the Unicode database for these methods. Could you please check whether the database marks the character as numeric ?
If yes, we may need to check the database generation.
Otherwise, there isn't much we can do, since we use the Unicode database as reference.
Thanks -- Marc-Andre Lemburg
Sent from my phone. See http://www.egenix.com/company/ for contact information and impressum.
On 21 September 2018 18:38:05 GMT+02:00, Serhiy Storchaka \report@bugs.python.org\ wrote:
Change by Serhiy Storchaka \storchaka+cpython@gmail.com\:
---------- nosy: +lemburg
Python tracker \report@bugs.python.org\ \https://bugs.python.org/issue34763\
Unicode 11.0.0 has 卅 (U+5345) as being numeric and having the value 30.
What's the difference between that and 丗 (U+4E17)?
I notice that they look at lot alike. Are they different variants, perhaps traditional vs simplified?
$ ./python
Python 3.8.0a0 (heads/master-dirty:06e7608207, Sep 20 2018, 01:52:01)
>>> import unicodedata
>>> unicodedata.unidata_version
'11.0.0'
>>> unicodedata.numeric('\u5345')
30.0
>>> unicodedata.numeric('\u4E17')
ValueError: not a numeric character
As I said on the PR, this is because Unicode gives U+4E17 (and other CJK ideographs) a numeric value only in the UniHan database not the normal UCD. makeunicodedata.py only looks at UCD for numeric values.
Tools/unicode/makeunicodedata.py looks at Unihan database for the fields kAccountingNumeric, kOtherNumeric, and kPrimaryNumeric in Unihan_NumericValues.txt:
https://github.com/python/cpython/blob/549e55a3086d04c13da9b6f33214f6399681292a/Tools/unicode/makeunicodedata.py#L1107-L1119
And as of Unicode version 12.0.0, 0x4E17 isn't listed as numeric there:
...
U+4E00 kPrimaryNumeric 1
U+4E03 kPrimaryNumeric 7
U+4E07 kPrimaryNumeric 10000
U+4E09 kPrimaryNumeric 3
...
Is there another way to get this information by using one of the fields shown at
http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=4E17
"丗" means "30" in Japanese. However, it is a variant Chinese character to "世", where "世" means "world" in Chinese.
I'm not sure if this information makes any difference.
unicode.org doesn't list "丗" as numeric so I think there is nothing we can do.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at =
created_at =
labels = ['type-bug', 'invalid', '3.9', 'expert-unicode']
title = 'Treat U+4E17 as a numeric value'
updated_at =
user = None
```
bugs.python.org fields:
```python
activity =
actor = 'xiang.zhang'
assignee = 'none'
closed = True
closed_date =
closer = 'xiang.zhang'
components = ['Unicode']
creation =
creator = '\xe8\x8d\x89\xe6\x9c\xa8\xe5\xbb\xba'
dependencies = []
files = []
hgrepos = []
issue_num = 34763
keywords = ['patch']
message_count = 8.0
messages = ['325992', '326010', '326011', '326034', '326055', '344144', '344391', '344440']
nosy_count = 10.0
nosy_names = ['lemburg', 'vstinner', 'benjamin.peterson', 'ezio.melotti', 'mrabarnett', 'steven.daprano', 'berker.peksag', 'xiang.zhang', 'johnlinp', '\xe8\x8d\x89\xe6\x9c\xa8\xe5\xbb\xba']
pr_nums = ['9474']
priority = 'normal'
resolution = 'not a bug'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue34763'
versions = ['Python 3.9']
```