snorkel-team / snorkel

A system for quickly generating training data with weak supervision
https://snorkel.org
Apache License 2.0
5.81k stars 857 forks source link

How to digest pdf files #886

Closed chhenning closed 6 years ago

chhenning commented 6 years ago

I have a lot of pdf's (1000+) which are share the same domain (healthcare information) but are structured differently. They all contain tables which contain the meat of the information. Can I use snorkle to extract out these tables and create a model from them using snorkle?

ajratner commented 6 years ago

Hi @chhenning there's a cool project in our lab, built on top of Snorkel, called Fonduer, that's exactly for this!! Right now it's a branch in Snorkel, but will likely be getting pushed to it's own repo soon- either way you should check it out! @SenWu @lukehsiao @bhancock8

chhenning commented 6 years ago

@ajratner Thanks A LOT!