lorey / mlscraper

🤖 Scrape data from HTML websites automatically by just providing examples
https://pypi.org/project/mlscraper/
1.31k stars 89 forks source link

Enable increasing complexity in rule-based scraper #5

Open lorey opened 4 years ago

lorey commented 4 years ago

Currently, the rule-based scraper tries all potential css selectors at once. It would be more performant if we increase the css selector complexity step by step though, so we first try single node rules like div.item and something like .menu > div.item.company if the simpler rules don't work.

lorey commented 2 years ago

This is still open, currently rules get generated with more complexity, but increasing complexity stops at selectors with two levels, e.g. .test > .value.target.box.