sdgathman / pyspf

Other
49 stars 26 forks source link

split() requires a non-empty pattern match #27

Closed niftylettuce closed 4 years ago

niftylettuce commented 4 years ago
15|smtp          | Error: /home/deploy/.local/lib/python3.6/site-packages/spf.py:1844: FutureWarning: split() requires a non-empty pattern match.
15|smtp          |   ln, reverse, delimiters = RE_ARGS.split(str)[1:4]
niftylettuce commented 4 years ago

I believe the issue is here:

https://github.com/sdgathman/pyspf/blob/df09fce7a06be9eec04d404a896b37b0a9de8ad2/spf.py#L181

niftylettuce commented 4 years ago

Perhaps change that to this?

RE_ARGS = re.compile(r'([0-9]+)(r?)([^0-9a-zA-Z]+)')

With + instead of *? Not sure, also - do other regex's need changed too?

niftylettuce commented 4 years ago

Actually, it just needs to be this instead:

if not str or str.strip() == ''
niftylettuce commented 4 years ago

That unfortunately did not work and the error still occurs. @sdgathman @kitterma does anyone know Python and have insight to why this error happens? And how we can write a workaround/patch? My patch didn't work...

sdgathman commented 4 years ago

Still needs a test case.

niftylettuce commented 4 years ago

@sdgathman that wasn't the fix though, that should not have been merged

sdgathman commented 4 years ago

All the more reason for adding a test case. I need to study your example to make one. I'm going to bed now.

sdgathman commented 4 years ago

Since it is a FutureWarning, it probably is a py2->py3 change. Does it work with python2.7?

niftylettuce commented 4 years ago

I don't think so, I think the RegExp is the culprit based off research off StackOverflow and GitHub folks with similar issues.

sdgathman commented 4 years ago

To fix it we need a test case. What domain encountered the error?

niftylettuce commented 4 years ago

I'm trying to figure this out, and it's super hard for me to discover, but I know for certain this happens quite a lot in a normal email environment using pyspf. My service is sending over 200-300K emails per month and it's popping up a few times every hour. I've mitigated it in the meanwhile with some exception catching, but still haven't found the culprit, maybe my exception catching isn't outputting to stdout/stderr properly. If you find something before I do let me know. Thank you @sdgathman for your time and help.

niftylettuce commented 4 years ago

It happens at the expand_one here, and I'm guessing it's matching PAT_CHAR (macro expansions) and encountering some parsing error:

        for i in RE_CHAR.finditer(s):
            result += s[end:i.start()]
            macro = s[i.start():i.end()]
            if macro == '%%':
                result += '%'
            elif macro == '%_':
                result += ' '
            elif macro == '%-':
                result += '%20'
            else:
                letter = macro[2].lower()
#                print letter
                if letter == 'p':
                    self.getp()
                elif letter in 'crt' and stripdot:
                    raise PermError(
                        'c,r,t macros allowed in exp= text only', macro)
                expansion = getattr(self, letter, self)
                if expansion:
                    if expansion == self:
                        raise PermError('Unknown Macro Encountered', macro) 
                    e = expand_one(expansion, macro[3:-1], JOINERS.get(letter))
                    if letter != macro[2]:
                        e = urllibparse.quote(e,'~')
                    result += e
# Regular expression to find macro expansions
PAT_CHAR = r'%(%|_|-|(\{[^\}]*\}))'
RE_CHAR = re.compile(PAT_CHAR)
niftylettuce commented 4 years ago

Until this bug is patched in pyspf, I have added the following on my side:

                if expansion:
                    if expansion == self:
                        raise PermError('Unknown Macro Encountered', macro) 
+                    try:
+                        e = expand_one(expansion, macro[3:-1], JOINERS.get(letter))
+                    except Exception as x:
+                        raise TempError('Macro Expansion error: ' + str(x))
                    if letter != macro[2]:
                        e = urllibparse.quote(e,'~')
                    result += e
sdgathman commented 4 years ago

This is not an error. FutureWarning is just warning that Python is about to change the behavior of split() in a way that will break it. Currently split() ignores 0 length matches. The plan is to have it match '' as well. Or maybe there will be a flag to select that. So we do need to use a different regex. I am not getting that warning, so I need to know what macros are triggering it.

https://docs.python.org/3.6/library/re.html?highlight=split#re.split

Here are some test cases I've tried:

Test SPF RFC macro expansion.

>>> from spf import query, expand_one, RE_ARGS

Examples:
>>> q = query(s='strong-bad@email.example.com',
...           h='mx.example.org', i='192.0.2.3')
>>> q.p = 'mx.example.org'
>>> q.r = 'example.net'

>>> q.expand('%{d}')
'email.example.com'

>>> expand_one('example.com','r',None)
'com.example'
>>> expand_one('strong-bad','-','.')
'strong.bad'
>>> expand_one('strong-bad','r','.')
'strong-bad'
>>> expand_one('strong-bad','r-','.')
'bad.strong'
>>> expand_one('strong-bad','1r-','.')
'strong'
>>> expand_one('strong-bad','','.')
'strong-bad'
>>> expand_one('strong-bad','r0','.')
'strong-bad'
>>> RE_ARGS.split('r0')
''
sdgathman commented 4 years ago

Until this bug is patched in pyspf, I have added the following on my side:

                if expansion:
                    if expansion == self:
                        raise PermError('Unknown Macro Encountered', macro) 
+                    try:
+                        e = expand_one(expansion, macro[3:-1], JOINERS.get(letter))
+                    except Exception as x:
+                        raise TempError('Macro Expansion error: ' + str(x))
                    if letter != macro[2]:
                        e = urllibparse.quote(e,'~')
                    result += e

You need to log the macro. You also need to tell us what version of python. Behaviour of split() changed in 3.5 and again in 3.7. A FutureWarning (or any other warning) is printed only the first time it is encountered by default. Python3.8 does not have this warning, because Python3.7 adds specific support for null matches:

https://docs.python.org/3.8/library/re.html?highlight=split#re.split

sdgathman commented 4 years ago

This is not a bug. I will consider specific suggestions for an alternative parser for the len,reverse,separator macro flags that passes all the test cases. :-)

Note, you can pass -W ignore to the python3.6 command line to stop annoying you with the obsolete warning.

sdgathman commented 4 years ago

You can turn off warnings in code also:

https://stackoverflow.com/questions/29086398/sklearn-turning-off-warnings#32389270