matthewmueller / x-ray

The next web scraper. See through the <html> noise.
MIT License
5.86k stars 350 forks source link

filter only selected words? #361

Open waptik opened 4 years ago

waptik commented 4 years ago

Subject of the issue

I'm crawling a webpage that has unwanted contents in my scope. I want to select tag that do not contain certain words in their content. jsoup has something like :contains() selector. can someone help me?

sample html can be found in this gist