mrabarnett / mrab-regex

Other
450 stars 49 forks source link

Kernel crash on fuzzy regex search #334

Open mrabarnett opened 5 years ago

mrabarnett commented 5 years ago

Original report by Anonymous.


The following regex fuzzy search will crash the Python kernel and give a message: “Kernel died, restarting”.

#!python

import regex as f_re
str_to_search = u'al, '
flags   = f_re.IGNORECASE | f_re.DOTALL | f_re.ENHANCEMATCH 
regex = r'(?e)(?:(?:(?=(?P<if_2_3>expression1\W+)?)(?P=if_2_3))?(?(if_2_3)expression2|expression3)){e<=1}'
regex_comp = f_re.compile(str(regex),flags)
print(str_to_search)
for match in regex_comp.finditer(str_to_search):
    print('does not even get to here, kernel crash before this')
    print(match.group())

It took me a long time to figure out exactly what were the exact crash conditions, and I also found out that there was an error in the regex expression since the ? in expression1\W+)? doesn’t make any sense, still, the module should crash the whole kernel.

I am using version '2.5.33' 64 bits for python 3.7 in windows 10 64, installed manually from wheel file. These are the most important package versions:

P.S. Thanks for the great module.

mrabarnett commented 5 years ago

Original comment by Matthew Barnett (Bitbucket: mrabarnett, GitHub: mrabarnett).


Here’s a script you could try to ‘clean’ the regex installation:

# Script to 'clean' an installation of the regex module.
from glob import glob
from os import remove
from os.path import dirname, isdir, join

import regex
site_folder = dirname(dirname(regex.__file__))
del regex

old_files = glob(join(site_folder, 'regex.*')) + glob(join(site_folder, '_regex.*'))

if isdir(join(site_folder, 'regex')):
    if old_files:
        for path in old_files:
            print(f'Found old file {path}')
            # Uncomment the next line to remove the old file.
            #remove(path)
    else:
        print('No old files')
else:
    print('New regex not found')
mrabarnett commented 5 years ago

Original comment by Matthew Barnett (Bitbucket: mrabarnett, GitHub: mrabarnett).


I reduced it down to:

import regex
print(regex.match(r'(?:(?=(e)?)\1){e<=1}', ' '))

which still crashed.

mrabarnett commented 5 years ago

Original comment by Bruno BC (Bitbucket: [Bruno BC](https://bitbucket.org/Bruno BC), ).


Thanks Matthew, I will try that.

Did you get the chance to test any other expressions, because this is the only one that crashes in my system. I can do other fuzzy searches without problems.

Regards,

Bruno

mrabarnett commented 5 years ago

Original comment by Matthew Barnett (Bitbucket: mrabarnett, GitHub: mrabarnett).


I think the problem might be that I re-organised the files previously and if you install a new version of regex some of the old files get left behind, which somehow works most of the time but sometimes doesn’t.

I found that problem on the Raspberry Pi and fixed it by uninstalling regex, removing any remaining files related to the regex module from the site-packages subfolder of Python, and then re-installing regex.

mrabarnett commented 5 years ago

Original comment by Matthew Barnett (Bitbucket: mrabarnett, GitHub: mrabarnett).


I was thinking about how to test it when it doesn’t crash on my machine and it occurred to me that I do have another machine I could test it on: a Raspberry Pi. It crashes!

mrabarnett commented 5 years ago

Original comment by Bruno BC (Bitbucket: [Bruno BC](https://bitbucket.org/Bruno BC), ).


Hello Matthew,

I cannot get past the print(pattern.search('al, ')) command. As soon as I execute this command, the kernel crashes.

This is a print-screen of my third try. (note: “Python dejó de funcionar” means Python stopped working)

Thanks,

Bruno

mrabarnett commented 5 years ago

Original comment by Matthew Barnett (Bitbucket: mrabarnett, GitHub: mrabarnett).


Could you try these at the Python prompt to confirm what you’re getting:

import regex
pattern = regex.compile(r'(?e)(?:(?:(?=(?P<if_2_3>expression1\W+)?)(?P=if_2_3))?(?(if_2_3)expression2|expression3)){e<=1}')

# Should print None.
print(pattern.search('al, '))

s = pattern.finditer('al, ')
# Should raise StopIteration.
next(s)

pattern = regex.compile(r'(?:(?:(?=(?P<if_2_3>expression1\W+)?)(?P=if_2_3))?(?(if_2_3)expression2|expression3)){e<=1}')

# Should print None.
print(pattern.search('al, '))

s = pattern.finditer('al, ')
# Should raise StopIteration.
next(s)
mrabarnett commented 5 years ago

Original comment by Bruno BC (Bitbucket: [Bruno BC](https://bitbucket.org/Bruno BC), ).


Thanks for the interest Matthew.

I can confirm that I still have the bug. It is a very strange bug. It won’t happen if I change the string to search or if I execute without fuzzy options. It must be something related to my installation.

This is the windows event log crash report:

-

-

**1000** **2** **100** **0x80000000000000** **37498** **Application**

-

**python.exe** **3.7.1150.1013** **5c0f4332** **\_regex.pyd** **0.0.0.0** **5a318168** **c0000005** **0000000000001ec1** **2db0** **01d530a6801bbc05** **C:\\Program Files\\Anaconda\\python.exe** **C:\\Program Files\\Anaconda\\lib\\site-packages\\\_regex.pyd** **0bdc30c4-b70b-4396-944d-9b61ae0c8031**

If you are interested (and tell me how), I can try to generate some additional log from within the execution script.

Regards,

B.

mrabarnett commented 5 years ago

Original comment by Matthew Barnett (Bitbucket: mrabarnett, GitHub: mrabarnett).


Just tried it (Windows 10 64-bit, Python 3.7). No crash. Tried Python 3.6 too. No crash.

mrabarnett commented 4 years ago

Original comment by Talha Moosani (Bitbucket: [Talha Moosani](https://bitbucket.org/Talha Moosani), ).


Hi Matthew, I am having the same issue with the below regex.

mypattern = r'(?<=Ves\/Voy\/Dir[\s]*)([\D]+[\s]*[\D]+)[\s]([\w]+.+)'
matchNum  = re.search(mypattern,text,re.IGNORECASE)

mrabarnett commented 4 years ago

Original comment by Matthew Barnett (Bitbucket: mrabarnett, GitHub: mrabarnett).


What is the value of ‘text'? Without that I’m unable to help. I need a complete example that shows the problem, ideally the smallest example that shows it.

dragoncoder047 commented 7 months ago

For the heck of it, I checked all the examples with the latest version of regex (now that #525 is fixed) in CPython 3.12 64-bit and none crash, all give the expected outputs. Perhaps this problems is related.