jamadden / mrab-regex-hg

Automatically exported from code.google.com/p/mrab-regex-hg
0 stars 2 forks source link

negated unicode properties in case-insensitive mode #22

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
While trying to test some of the recently listed properties supported by regex, 
it appears to me, that the negated properties don't work in case insensitive 
search; cf.:

>>> regex.findall(ur"(?i)\P{InBasicLatin}",u"aáb")
[u'a', u'b']
>>> regex.findall(ur"(?i)\p{InBasicLatin}",u"aáb")
[u'a', u'b']
>>> 
>>> regex.findall(ur"\P{InBasicLatin}",u"aáb")
[u'\xe1']
>>> regex.findall(ur"\p{InBasicLatin}",u"aáb")
[u'a', u'b']
>>> 

as if the negated property literal \P would somehow taken in lowercase (?)

some other literals don't seem to be affected, e.g.

>>> regex.findall(ur"\s",u"a b\tcd")
[u' ', u'\t']
>>> regex.findall(ur"\S",u"a b\tcd")
[u'a', u'b', u'c', u'd']
>>> 
>>> regex.findall(ur"(?i)\s",u"a b\tcd")
[u' ', u'\t']
>>> regex.findall(ur"(?i)\S",u"a b\tcd")
[u'a', u'b', u'c', u'd']
>>> 
works as expected.

Regards,
   vbr

Original issue reported on code.google.com by Vlastimil.Brom@gmail.com on 28 Sep 2011 at 8:12

GoogleCodeExporter commented 9 years ago
Ouch! It's a bug.

Fixed in regex 0.1.20110929.

Original comment by re...@mrabarnett.plus.com on 28 Sep 2011 at 11:32