mailgun / talon

Apache License 2.0
1.27k stars 285 forks source link

Signature Detection in HTML Emails #14

Open petemichel77 opened 10 years ago

petemichel77 commented 10 years ago

I haven't had a lot of success parsing signatures out of text/html emails. It seems to work pretty well for text/plain emails. Is there a good strategy to parse out the signature for text/html emails?

Thanks, Pete

obukhov-sergey commented 10 years ago

Hi Pete,

Thanks for the question. We haven't researched it yet but there might be some special tags that are used for signature formatting - kind of like <blockquote> tag used for quotations. Another option could be converting html to text, applying text-signature parsing algos and converting back. We used a combination of this two approaches for parsing html quotations.

BR, Sergey @MG

daniel-centore commented 8 years ago

Just wanted to check in an see if anything has happened here. @obukhov-sergey , you seem to have suggested that you've already done this before? Is the code for this technique available?