hhursev / recipe-scrapers

Python package for scraping recipes data
MIT License
1.71k stars 523 forks source link

Tests: remove 'ingredient_groups' check for scrapers that don't support/implement ingredient groups. #1181

Closed jayaddison closed 2 months ago

jayaddison commented 2 months ago

The fact that the data-driven JSON tests have an expectation on the ingredient_groups field makes it a bit more challenging to write new scrapers, I think. My sense is that it's not the most initially-intuitive of fields for beginners -- partly because it duplicates another field, and partly because of the way it inter-relates with that other field (ingredients must match, and yet must be grouped -- and although the utility functions make the code for that relatively concise, the details can be a bit complex to understand).

jayaddison commented 2 months ago

Most of this functionality can be achieved simply by moving ingredient_groups from the MANDATORY_TESTS list into the OPTIONAL_TESTS list, and I think that's the most effective way to resolve this bug.

However, it has made me wonder about something during implementation/investigation: perhaps we should require fields -- even in the OPTIONAL_TESTS list -- to have a matching test data entry when they override the default implementation.

Detecting whether a method is overridden can be somewhat complicated due to the presence of plugin-related decorators (ExampleScraper.nutrients == AbstractScraper.nutrients may return False even when there is no override for nutrients on ExampleScraper).

jayaddison commented 2 months ago

A short one-liner script to more-or-less determine scrapers that have not overridden ingredient_groups:

print('\n'.join(sorted({scraper.host() for scraper in recipe_scrapers.SCRAPERS.values() if hasattr(scraper, 'ingredient_groups') and scraper.ingredient_groups == recipe_scrapers.AbstractScraper.ingredient_groups})))

Edit: de-duplicate, and print results per-line.

jayaddison commented 2 months ago

...plus a little bash scripting open the relevant filenames for editing...

vim $(python -c "import recipe_scrapers; print('\n'.join(sorted({scraper.host() for scraper in recipe_scrapers.SCRAPERS.values() if hasattr(scraper, 'ingredient_groups') and scraper.ingredient_groups == recipe_scrapers.AbstractScraper.ingredient_groups})))" | while read line; do grep -l ingredient_groups tests/test_data/$line/*.json; done )
jayaddison commented 2 months ago

...find-and-remove entries with some vim pattern-matching:

/"ingredient_groups"  # find ingredient groups in the current file
d%                    # delete to the end of the current bracketed-group (matches the `[` on each line and deletes up to the corresponding `]`)
dd                    # cleanup the remaining doublequotes and trailing comma that remain

(repeated a lot -- I ended up skipping the dd and doing those in batch later via :bufdo!%s/^ ",\n//g|wn)