qntm / greenery

Regular expression manipulation library
http://qntm.org/greenery
MIT License
331 stars 40 forks source link

Unicode regexes not supported #38

Closed d33tah closed 6 years ago

d33tah commented 6 years ago
[21:53:07]>>> parse(u'我')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/d33tah/virtualenv-py2/local/lib/python2.7/site-packages/greenery/lego.py", line 1641, in __repr__
    string += ", ".join(repr(c) for c in self.concs)
  File "/home/d33tah/virtualenv-py2/local/lib/python2.7/site-packages/greenery/lego.py", line 1641, in <genexpr>
    string += ", ".join(repr(c) for c in self.concs)
  File "/home/d33tah/virtualenv-py2/local/lib/python2.7/site-packages/greenery/lego.py", line 1404, in __repr__
    string += ", ".join(repr(m) for m in self.mults)
  File "/home/d33tah/virtualenv-py2/local/lib/python2.7/site-packages/greenery/lego.py", line 1404, in <genexpr>
    string += ", ".join(repr(m) for m in self.mults)
  File "/home/d33tah/virtualenv-py2/local/lib/python2.7/site-packages/greenery/lego.py", line 1169, in __repr__
    string += repr(self.multiplicand)
  File "/home/d33tah/virtualenv-py2/local/lib/python2.7/site-packages/greenery/lego.py", line 636, in __repr__
    string += repr("".join(str(char) for char in sorted(self.chars, key=str)))
UnicodeEncodeError: 'ascii' codec can't encode character u'\u6211' in position 0: ordinal not in range(128)
qntm commented 6 years ago

I've seen (very frustrating) issues like that before and in my experience they are generally caused by one's console encoding. The Python program

from greenery.lego import parse
print(parse(u'我'))

works fine for me at the command line with python uni.py.

qntm commented 6 years ago

Uh, also, Python 2.7 is not supported. I am sorry that a more sensible error message isn't thrown when greenery is used with Python 2.7. I will look into adding that.