zopefoundation / zope.index

Indices for using with catalog like text, field, etc.
Other
10 stars 12 forks source link

Python 3.6 compatibility #8

Closed icemac closed 7 years ago

icemac commented 7 years ago

This package is not yet compatible with Python 3.6, see https://travis-ci.org/zopefoundation/zopetoolkit/jobs/173487024#1333

mgedmin commented 7 years ago

Test log consists of many (11) repetitions of this error:

Traceback (most recent call last):
  File "/opt/python/3.6-dev/lib/python3.6/unittest/case.py", line 59, in testPartExecutor
    yield
  File "/opt/python/3.6-dev/lib/python3.6/unittest/case.py", line 601, in run
    testMethod()
  File "/home/travis/build/zopefoundation/zopetoolkit/eggs/zope.index-4.2.0-py3.6-linux-x86_64.egg/zope/index/text/tests/test_htmlsplitter.py", line 36, in test_class_conforms_to_ISplitter
    verifyClass(ISplitter, self._getTargetClass())
  File "/home/travis/build/zopefoundation/zopetoolkit/eggs/zope.index-4.2.0-py3.6-linux-x86_64.egg/zope/index/text/tests/test_htmlsplitter.py", line 27, in _getTargetClass
    from zope.index.text.htmlsplitter import HTMLWordSplitter
  File "/home/travis/build/zopefoundation/zopetoolkit/eggs/zope.index-4.2.0-py3.6-linux-x86_64.egg/zope/index/text/htmlsplitter.py", line 23, in <module>
    WORDS = re.compile(r"(?L)\w+")
  File "/home/travis/virtualenv/python3.6-dev/lib/python3.6/re.py", line 233, in compile
...
    raise ValueError("cannot use LOCALE flag with a str pattern")
ValueError: cannot use LOCALE flag with a str pattern
mgedmin commented 7 years ago

The offending code line is:

WORDS = re.compile(r"(?L)\w+")
GLOBS = re.compile(r"(?L)\w+[\w*?]*")

Now I don't know whether this regex is intended to handle Unicode or byte strings (and also whether that changes depending on which Python version you use), so I don't know whether the correct thing to do is to remove the (?L) flag, or change the regex string to be a bytestring, or do so depending on Python version, or what.

jamadden commented 7 years ago

The interface documentation uses the word "text" and "text" is the name of the package. Typically in zope packages that means unicode strings.

However, the test case simply passes native strings---bytes on Py2, text/unicode on Py3. The tests do change the locale and assert that things happen...BUT, the docs for re.LOCALE state that (emphasis mine):

Make \w, \W, \b, \B, \s and \S dependent on the current locale. The use of this flag is discouraged as the locale mechanism is very unreliable, and it only handles one “culture” at a time anyway; you should use Unicode matching instead, which is the default in Python 3 for Unicode (str) patterns. This flag can be used only with bytes patterns.

So I suspect that on Python 3, we don't want the (?L) flag at all.