snorkel-team / snorkel

A system for quickly generating training data with weak supervision
https://snorkel.org
Apache License 2.0
5.81k stars 857 forks source link

Question: Is email signature extraction a valid use case of snorkel? #891

Closed mdav43 closed 6 years ago

mdav43 commented 6 years ago

Apologies if this is the wrong format to seek advice.

I am seeking advice on whether snorkel is a valid application/approach to extracting email signatures from a body of emails or whether this is overkill/wrong application/use case? I know there is a mail-gun library (talon) but thought snorkel could also be applicable?

stephenbach commented 6 years ago

I think it could fit well! It's essentially a unary relation classification problem. So check out the intro. tutorial, but in your case, instead of classifying pairs of people, you're classifying individual spans corresponding to candidate signatures. You'll need to write a custom matcher to generate candidates though. Let us know what you find!