Closed jknndy closed 2 months ago
With the current code setup the .py file must remain in place for the scraper to be regonized without wild_mode present. This may change in the v15 branch?
At the moment I think that'll continue to be true for v15
at least. It's useful (maybe necessary!) to have some kind of indicator in the library-code (without relying on the test cases, because they may not / should not be loaded at runtime) to indicate whether the library thinks the site is supported (meaning: has been through the discovery, test authoring, review, acceptance, release process; can have bugs reported against it, etc).
I'd sorta suggest moving the Waitrose changes out of this PR too - it would make it more focused (for example in the commit history) if this changeset only affects the schema.org
fields.
In any case: creat cleanup!
At the moment I think that'll continue to be true for
v15
at least. It's useful (maybe necessary!) to have some kind of indicator in the library-code (without relying on the test cases, because they may not / should not be loaded at runtime) to indicate whether the library thinks the site is supported (meaning: has been through the discovery, test authoring, review, acceptance, release process; can have bugs reported against it, etc).
I agree, I'll look into this a bit but i don't have any ideas off the bat.
I'd sorta suggest moving the Waitrose changes out of this PR too - it would make it more focused (for example in the commit history) if this changeset only affects the schema.org fields.
Done, I'm going to open another PR once this one is merged to add the author field back to waitrose
I think we're good merge here!
Oop, looks like you maybe forgot to revert a .testhtml
and .json
-- apart from that, looks good to me :+1:
Hey @jayaddison , Spent a little more time on this expanding to cover some of the missed fields. This required a few changes that will definetly need a review.
recipe_scrapers/plugins/schemaorg_fill.py
cuisine
,description
, cook_time
, prep_time
to the run_on_methods
dict in schemaorg. This resolves the issue (in addition to the change below) of the scraper throwing errors for certain fields when not listed in the scraper but covered in the test.@jknndy did you decide not to continue on with this? (I'm wondering whether to update any thoughts about it in the developer experience improvements issue thread)
@jknndy did you decide not to continue on with this? (I'm wondering whether to update any thoughts about it in the developer experience improvements issue thread)
Unfortunately this was an unintended closure. I reset my main repo for the library & lost all the attached branches. I've reinstated 2/3 but didn't realize this one was lost too.
I should have time to pull it together and open a new PR this afternoon.
Ah yep, that happens. In case it might be helpful: there is some GitHub magic to retrieve the commits from a pull request branch, even for closed / deleted cases like this:
Ah yep, that happens. In case it might be helpful: there is some GitHub magic to retrieve the commits from a pull request branch, even for closed / deleted cases like this:
Hugely helpful! Thanks very much
Credit to @bfcarpio for the original idea for this clean up
Removed most occurrences of direct schema calls, exceptions below, and made small improvements to two scrapers .
Improvements
Issues
Some fields return an error when removed. Originally I thought this was related to the
MANDATORY_TESTS
vsOPTIONAL_TESTS
but wasn't able to get to the root. Could use some input if anyone has any ideas but this PR should pass all the tests as is. EDIT: Seems directly related to the issue raised in #1020With the current code setup the .py file must remain in place for the scraper to be regonized without wild_mode present. This may change in the v15 branch?
keys that return errors when removed