Closed jknndy closed 10 months ago
Thoughts on adding canonical_url to the list of required test coverage?
It's a good idea. Could the generate.py
script attempt to add it automatically there too? (two problems: the HTML might not contain the canonical URL, and the URL provided to generate.py
on the commandline might end up being redirected to a different URL.. but initially we could simply write that URL that the user provided)
Re: the encoding issue for Waitrose -- it looks like another case where the HTML you received from the site had some slightly unusual encoding. I've downloaded an updated copy with wget2
and your scraper seems to work fine with that; I'll add the updated HTML here soon.
Could the
generate.py
script attempt to add it automatically there too?
Hadn't even thought of being able to add it automatically, I'll take a look at that once this is merged, I can also add some info to the developer guide to explain how to handle it with the unittest skip if it won't pass.
Could the
generate.py
script attempt to add it automatically there too?Hadn't even thought of being able to add it automatically, I'll take a look at that once this is merged, I can also add some info to the developer guide to explain how to handle it with the unittest skip if it won't pass.
Sounds great, yep. The reason I thought about adding it automatically is to try to avoid the process of writing a new scraper becoming more difficult.
We could place # @unittest.skip("canonical_url is not available from this webpage")
(commented-out) above the autogenerated test_canonical_url
method -- it'd be easy to spot and suggest removing it during code review, or for anyone who has problems with gathering the canonical_url
they could uncomment it.
Is that all of them @jknndy? :)
Is that all of them @jknndy? :)
I'd say 95% there are a few that i skipped throughout the process that were taking up to much time to complete or required some work for encoding issues. I'll take a shot at them this week and will try the wget2 for my encoding issues!
Then it's back to ingredient_grouping additions!
Brilliant. Nice work!
Didn't realize i had already started this one a few weeks ago!
Dropped support
thenutritiouskitchen- Site no longer exists
Adjusted scrapers
recipe_scrapers/southernliving.py - yields html fallback, pages that contain ranges of yields don't return via schema recipe_scrapers/sweetpeasandsaffron.py - Added get_minutes & get_yields util recipe_scrapers/zeitwochenmarkt.py - Ingredients tag changed
Notes
tests/test_waitrose.py - Returning a text encoding issue
Thoughts on adding canonical_url to the list of required test coverage?