mailgun / flanker

Python email address and Mime parsing library
http://www.mailgun.com
Apache License 2.0
1.63k stars 204 forks source link

Allow Unicode characters in custom grammar checks #195

Closed b0d0nne11 closed 6 years ago

b0d0nne11 commented 6 years ago

This change allows Unicode characters to be used in custom grammar checks. I tried to limit the changes by keeping the left-to-right TokenStream class and re-using the patterns from the main lexer where possible. This should close a bug where some addresses from hotmail.fr and the like dont parse correctly.

horkhe commented 6 years ago

Any idea why tests fail on our CI?

b0d0nne11 commented 6 years ago

I'm not sure yet. They pass locally and on Travis. Something to do with the guess encoding function.

b0d0nne11 commented 6 years ago

Turns out cchardet isn't available in our CI builds and the fallback detector isn't as good.

horkhe commented 6 years ago

@b0d0nne11 I fixed our CI job to install cchardet.