lorey / mlscraper

🤖 Scrape data from HTML websites automatically by just providing examples
https://pypi.org/project/mlscraper/
1.31k stars 89 forks source link

Fuzzy text matching #15

Open lorey opened 2 years ago

lorey commented 2 years ago

Specifically for text matching something fuzzy would be great to reduce errors, e.g. checking for similarity of long texts to avoid whitespace-based errors, etc.

Options

Also it needs to be considered when checking for correctness later as scraper.get(page) == expected_result could turn out to be false.

lorey commented 2 years ago

19 raised a case where it looks like a match with   instead of spaces is not found.