hhursev / recipe-scrapers

Python package for scraping recipes data
MIT License
1.61k stars 505 forks source link

Reopening #1025: Remove schema calls with no overrides #1065

Closed jknndy closed 2 months ago

jknndy commented 2 months ago

Removed most occurrences of direct schema calls, exceptions below, and made small improvements to two scrapers . Improvements

Waitrose.py - added dynamic author retrieval and site_name coverage
vegolosi.py - pulled new source and removed fields now covered by schema

Issues

Some fields return an error when removed. Originally I thought this was related to the MANDATORY_TESTS vs OPTIONAL_TESTS but wasn't able to get to the root. Could use some input if anyone has any ideas but this PR should pass all the tests as is. EDIT: Seems directly related to the issue raised in https://github.com/hhursev/recipe-scrapers/issues/1020

With the current code setup the .py file must remain in place for the scraper to be recognized without wild_mode present. This may change in the v15 branch?

jayaddison commented 2 months ago

With the current code setup the .py file must remain in place for the scraper to be recognized without wild_mode present. This may change in the v15 branch?

Nope, this will not change with v15; each scraper will still require a .py file to exist as an indicator that the website is known/supported.

jayaddison commented 2 months ago

Some fields return an error when removed. Originally I thought this was related to the MANDATORY_TESTS vs OPTIONAL_TESTS but wasn't able to get to the root. Could use some input if anyone has any ideas but this PR should pass all the tests as is. EDIT: Seems directly related to the issue raised in https://github.com/hhursev/recipe-scrapers/issues/1020

Did you manage to figure out what was going with this?

jknndy commented 2 months ago

Did you manage to figure out what was going with this?

Not yet, hadn't come back to it yet but i'll look into it soon

jknndy commented 2 months ago

Did you manage to figure out what was going with this?

Bringing the branch up to date resolved whatever the issue was. There are a few bugs to get here, i'll open an issue to track but no reason to hold this up while its still caught up.

Few other changes in the most recent commit:

  1. lekkerensimpel - updated author function to grab the info elsewhere so all tests would pass rather than 1 with schema & 2 with null
  2. owenhan - updated author to return the correct value.
  3. saltpepperskillet - updated author to use .title() instead of .capitalize() simplify / align with other code.
  4. *test cases
  5. lecker - pulled new test data for _2 which increased the scraper coverage