Remove schema calls with no overrides

jknndy commented 3 months ago

Credit to @bfcarpio for the original idea for this clean up

Removed most occurrences of direct schema calls, exceptions below, and made small improvements to two scrapers .

Improvements

Waitrose.py - added dynamic author retrieval and site_name coverage ~vegolosi.py - pulled new source and removed fields now covered by schema~

Issues

Some fields return an error when removed. Originally I thought this was related to the MANDATORY_TESTS vs OPTIONAL_TESTS but wasn't able to get to the root. Could use some input if anyone has any ideas but this PR should pass all the tests as is. EDIT: Seems directly related to the issue raised in #1020

With the current code setup the .py file must remain in place for the scraper to be regonized without wild_mode present. This may change in the v15 branch?

keys that return errors when removed

cuisine
cook_time
nutrients
prep_time
description

jayaddison commented 3 months ago

With the current code setup the .py file must remain in place for the scraper to be regonized without wild_mode present. This may change in the v15 branch?

At the moment I think that'll continue to be true for v15 at least. It's useful (maybe necessary!) to have some kind of indicator in the library-code (without relying on the test cases, because they may not / should not be loaded at runtime) to indicate whether the library thinks the site is supported (meaning: has been through the discovery, test authoring, review, acceptance, release process; can have bugs reported against it, etc).

jayaddison commented 3 months ago

I'd sorta suggest moving the Waitrose changes out of this PR too - it would make it more focused (for example in the commit history) if this changeset only affects the schema.org fields.

In any case: creat cleanup!

jknndy commented 3 months ago

At the moment I think that'll continue to be true for v15 at least. It's useful (maybe necessary!) to have some kind of indicator in the library-code (without relying on the test cases, because they may not / should not be loaded at runtime) to indicate whether the library thinks the site is supported (meaning: has been through the discovery, test authoring, review, acceptance, release process; can have bugs reported against it, etc).

I agree, I'll look into this a bit but i don't have any ideas off the bat.

I'd sorta suggest moving the Waitrose changes out of this PR too - it would make it more focused (for example in the commit history) if this changeset only affects the schema.org fields.

Done, I'm going to open another PR once this one is merged to add the author field back to waitrose

I think we're good merge here!

jayaddison commented 3 months ago

Oop, looks like you maybe forgot to revert a .testhtml and .json -- apart from that, looks good to me :+1:

jknndy commented 3 months ago

Hey @jayaddison , Spent a little more time on this expanding to cover some of the missed fields. This required a few changes that will definetly need a review.

recipe_scrapers/plugins/schemaorg_fill.py
- added cuisine,description, cook_time, prep_time to the run_on_methods dict in schemaorg. This resolves the issue (in addition to the change below) of the scraper throwing errors for certain fields when not listed in the scraper but covered in the test.
`tests/init.py
- Removed else: to raise an error if the key does not raise an exception. (This code can be shortened by combining it with the test for optional test but wanted to see what the general opinon on this adjustment)

jayaddison commented 2 months ago

@jknndy did you decide not to continue on with this? (I'm wondering whether to update any thoughts about it in the developer experience improvements issue thread)

jknndy commented 2 months ago

@jknndy did you decide not to continue on with this? (I'm wondering whether to update any thoughts about it in the developer experience improvements issue thread)

Unfortunately this was an unintended closure. I reset my main repo for the library & lost all the attached branches. I've reinstated 2/3 but didn't realize this one was lost too.

I should have time to pull it together and open a new PR this afternoon.

jayaddison commented 2 months ago

Ah yep, that happens. In case it might be helpful: there is some GitHub magic to retrieve the commits from a pull request branch, even for closed / deleted cases like this:

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

jknndy commented 2 months ago

Ah yep, that happens. In case it might be helpful: there is some GitHub magic to retrieve the commits from a pull request branch, even for closed / deleted cases like this:

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

Hugely helpful! Thanks very much

hhursev / recipe-scrapers

Remove schema calls with no overrides #1025

Improvements

Issues