mailgun / flanker

Python email address and Mime parsing library
http://www.mailgun.com
Apache License 2.0
1.63k stars 204 forks source link

Faster encodedword unfold by 15x #226

Closed carsonip closed 4 years ago

carsonip commented 5 years ago

15x faster unfold function:

  1. Use the re object's instance method instead of re.sub to skip _compile. See: https://stackoverflow.com/a/47477439/3315725
  2. Change replacing backref to only replace part with empty string (I don't understand why it was done that way in the first place)
  3. Use non-capturing groups for another 10-20% boost
    
    In [22]: _RE_FOLDING_WHITE_SPACES = re.compile(r"(\n\r?|\r\n?)(\s*)")

In [23]: def unfold(value): ....: """ ....: Unfolding is accomplished by simply removing any CRLF ....: that is immediately followed by WSP. Each header field should be ....: treated in its unfolded form for further syntactic and semantic ....: evaluation. ....: """ ....: return re.sub(_RE_FOLDING_WHITE_SPACES, r'\2', value) ....:

In [24]: %timeit unfold(x) 100000 loops, best of 3: 9.59 µs per loop

In [26]: _RE_FOLDING_WHITE_SPACES = re.compile(r"(?:\n\r?|\r\n?)")

In [27]: def unfold(value): ....: """ ....: Unfolding is accomplished by simply removing any CRLF ....: that is immediately followed by WSP. Each header field should be ....: treated in its unfolded form for further syntactic and semantic ....: evaluation. ....: """ ....: return _RE_FOLDING_WHITE_SPACES.sub('', value) ....:

In [29]: %timeit unfold(x) 1000000 loops, best of 3: 598 ns per loop

mailgun-ci commented 5 years ago

Can one of the admins verify this patch?