Closed mauritsvanrees closed 5 years ago
@mauritsvanrees . Tests still failing. https://jenkins.plone.org/job/pull-request-5.2-3.7/230/testReport/junit/plone.i18n.normalizer/ja/Normalizer/?auto_refresh=false
This is good. Now we finally see what the wrong normalized word is in Japanese. It is ‘2xm5gfy’ which is 7 characters and we expect 8. This does not tell me anything. At this point we would need someone who knows Japanese to help debug this. Do you know Japanese? Maybe @terapyon can help?
I will soon check it.
The problem is here: https://github.com/plone/plone.i18n/blob/0995a66d4025357f2ab8e035141687fac7061ab7/plone/i18n/normalizer/ja.py#L13-L20
The hash changes at each test run, giving random results. In some cases this will lead to a shorter normalized word.
It is probably fine to make the tests less strict, accepting 5 or 6 when we currently expect 6. In some calls we explicitly pass a length, not sure if it matters when we get one more or less. The tests may be way too explicit.
Actually the parameter is called max_length https://github.com/plone/plone.i18n/blob/0995a66d4025357f2ab8e035141687fac7061ab7/plone/i18n/normalizer/ja.py#L74 That makes your suggestion fit the code. What I am worried about is that on Python 2 the hash is reproducible, on Python 3 it is not:
[ale@emily ~]$ python2 -c "print(hash('a'))"
12416037344
[ale@emily ~]$ python2 -c "print(hash('a'))"
12416037344
[ale@emily ~]$ python3 -c "print(hash('a'))"
2835895958318421574
[ale@emily ~]$ python3 -c "print(hash('a'))"
7109848119625085005
unless you disable randomization:
[ale@emily ~]$ PYTHONHASHSEED=0 python3 -c "print(hash('a'))"
-7583489610679606711
[ale@emily ~]$ PYTHONHASHSEED=0 python3 -c "print(hash('a'))"
-7583489610679606711
PR coming with the change you suggested.
I understood. This test case is rare pattern.
The _gethashed
should be return less than MAX_LENGTH ASCII characters.
Almost case will be get just MAX_LENGTH ASCII characters.
Python 3 hash function was changed, but this implements no problem for Japanese user.
I recommend we will change test character, it means the bellow.
Now
text = u"テストページ"
New
text = u"公開テストページ"
@mauritsvanrees @ale-rt Do you think it?
Thanks @terapyon for your comments, it is really valuable! I merged https://github.com/plone/plone.i18n/pull/27 and I will make a new PR to change the tested string and ask you to review.
Let's see if the changes from @ale-rt, thanks to @terapyon input, fixes it! A truly worldwide effort :wink:
On Jenkins, sometimes the tests pass and sometimes they fail. See for example this Plone 5.2 Python 3.6 job:
Or another example, where apparently the above test goes fine, but the almost same test right below it goes wrong, like in this Plone 5.2 Python 3.7 job:
It might be something Python3 specific, as I don't currently see recent failures in Python 2.7 jobs. Maybe the Python 2.7 were lucky so far this year. It might also depend on which machine is running the jobs. Locally Python 3.6 is fine for me.
Hard to debug, as I don't know Japanese, and there is no indication what the expected normalized version would look like. I have a fix ready to make the tests more verbose in case of an error.