snorkel-team / snorkel

A system for quickly generating training data with weak supervision
https://snorkel.org
Apache License 2.0
5.81k stars 857 forks source link

Is there any documentation for HTMLDocPreProcessor ? Can I use it to parse HTML Document #1618

Closed riteshbathwal closed 3 years ago

riteshbathwal commented 4 years ago

It would be helpful to understand how I can use the class HTMLDocPreProcessor of snorkel to parse a HTML document and get insights from the content of a web page based on the parsing results. Any pointer in this direction will of great help in using the snorkel to create my ML model around HTML Documents

bhancock8 commented 3 years ago

Snorkel actually doesn't have a HTMLDocPreProcessor class. I believe you're thinking of a class from Fonduer, a project in the Snorkel family, but with a separate repository.