It would be helpful to understand how I can use the class HTMLDocPreProcessor of snorkel to parse a HTML document and get insights from the content of a web page based on the parsing results. Any pointer in this direction will of great help in using the snorkel to create my ML model around HTML Documents
Snorkel actually doesn't have a HTMLDocPreProcessor class. I believe you're thinking of a class from Fonduer, a project in the Snorkel family, but with a separate repository.
It would be helpful to understand how I can use the class HTMLDocPreProcessor of snorkel to parse a HTML document and get insights from the content of a web page based on the parsing results. Any pointer in this direction will of great help in using the snorkel to create my ML model around HTML Documents