jamadden / mrab-regex-hg

Automatically exported from code.google.com/p/mrab-regex-hg
0 stars 2 forks source link

not all keywords are found by named list with overlapping keywords when full Unicode casefolding is required #50

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

>>> import regex
>>> p = regex.compile(ur'(?fi)\L<keywords>', keywords=['post','pos'])
>>> p.findall(u'POST, Post, post, poſt, post, and poſt')

What is the expected output? What do you see instead?

Expected:

[u'POST', u'Post', u'post', u'po\u017ft', u'po\ufb06', u'po\ufb05']

Got:

[u'POST', u'Post', u'post', u'po\u017ft']

What version of the product are you using? On what operating system?

regex.__version__ == '2.4.0'
sys.version_info == (2, 6, 5, 'final', 0)
platform.platform() == 'Linux-3.0.0-15-generic-x86_64-with-Ubuntu-11.10-oneiric'

Please provide any additional information below.

>>> p = regex.compile(ur'(?fi)pos|post')
>>> p.findall(u'POST, Post, post, poſt, post, and poſt')
[u'POS', u'Pos', u'pos', u'po\u017f']
>>> p = regex.compile(ur'(?fi)post|pos')
>>> p.findall(u'POST, Post, post, poſt, post, and poſt')
[u'POST', u'Post', u'post', u'po\u017ft']
>>> p = regex.compile(ur'(?fi)post|another')
>>> p.findall(u'POST, Post, post, poſt, post, and poſt')
[u'POST', u'Post', u'post', u'po\u017ft', u'po\ufb06', u'po\ufb05']

Original issue reported on code.google.com by 4kir4...@gmail.com on 26 Jan 2012 at 2:50

GoogleCodeExporter commented 9 years ago
Fixed in regex 0.1.20120126.

Original comment by re...@mrabarnett.plus.com on 26 Jan 2012 at 5:56