hhursev / recipe-scrapers

Python package for scraping recipes data
MIT License
1.61k stars 505 forks source link

Remove schema calls with no overrides #1025

Closed jknndy closed 2 months ago

jknndy commented 3 months ago

Credit to @bfcarpio for the original idea for this clean up

Removed most occurrences of direct schema calls, exceptions below, and made small improvements to two scrapers .

Improvements

Issues

Some fields return an error when removed. Originally I thought this was related to the MANDATORY_TESTS vs OPTIONAL_TESTS but wasn't able to get to the root. Could use some input if anyone has any ideas but this PR should pass all the tests as is. EDIT: Seems directly related to the issue raised in #1020

With the current code setup the .py file must remain in place for the scraper to be regonized without wild_mode present. This may change in the v15 branch?

keys that return errors when removed

jayaddison commented 3 months ago

With the current code setup the .py file must remain in place for the scraper to be regonized without wild_mode present. This may change in the v15 branch?

At the moment I think that'll continue to be true for v15 at least. It's useful (maybe necessary!) to have some kind of indicator in the library-code (without relying on the test cases, because they may not / should not be loaded at runtime) to indicate whether the library thinks the site is supported (meaning: has been through the discovery, test authoring, review, acceptance, release process; can have bugs reported against it, etc).

jayaddison commented 3 months ago

I'd sorta suggest moving the Waitrose changes out of this PR too - it would make it more focused (for example in the commit history) if this changeset only affects the schema.org fields.

In any case: creat cleanup!

jknndy commented 3 months ago

At the moment I think that'll continue to be true for v15 at least. It's useful (maybe necessary!) to have some kind of indicator in the library-code (without relying on the test cases, because they may not / should not be loaded at runtime) to indicate whether the library thinks the site is supported (meaning: has been through the discovery, test authoring, review, acceptance, release process; can have bugs reported against it, etc).

I agree, I'll look into this a bit but i don't have any ideas off the bat.

I'd sorta suggest moving the Waitrose changes out of this PR too - it would make it more focused (for example in the commit history) if this changeset only affects the schema.org fields.

Done, I'm going to open another PR once this one is merged to add the author field back to waitrose

I think we're good merge here!

jayaddison commented 3 months ago

Oop, looks like you maybe forgot to revert a .testhtml and .json -- apart from that, looks good to me :+1:

jknndy commented 3 months ago

Hey @jayaddison , Spent a little more time on this expanding to cover some of the missed fields. This required a few changes that will definetly need a review.

jayaddison commented 2 months ago

@jknndy did you decide not to continue on with this? (I'm wondering whether to update any thoughts about it in the developer experience improvements issue thread)

jknndy commented 2 months ago

@jknndy did you decide not to continue on with this? (I'm wondering whether to update any thoughts about it in the developer experience improvements issue thread)

Unfortunately this was an unintended closure. I reset my main repo for the library & lost all the attached branches. I've reinstated 2/3 but didn't realize this one was lost too.

I should have time to pull it together and open a new PR this afternoon.

jayaddison commented 2 months ago

Ah yep, that happens. In case it might be helpful: there is some GitHub magic to retrieve the commits from a pull request branch, even for closed / deleted cases like this:

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

jknndy commented 2 months ago

Ah yep, that happens. In case it might be helpful: there is some GitHub magic to retrieve the commits from a pull request branch, even for closed / deleted cases like this:

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

Hugely helpful! Thanks very much