Closed smilerz closed 2 years ago
Thank you @smilerz for notifying us. I did some quick testing and came up with some additional notes:
instructions()
) also return empty valuestesthtml
file leads to a CSS-less based page.It seems like this website might have updated.
ok it appears that the above link is an "article" not a recipe (see url schema).
articles dont have many fields recipes have, i will add some try blocks so that articles provide as much data as possible.
PR will follow tomorrow.
Nice, thank you for looking into this @vabene1111. Pages that have type Article
seem to be a fairly frequent issue.
I have one or two suggestions for the code based on f883dcc25a1d189ee2ed1deec1ab4e8c8cc4e6fa - let me know if you'd appreciate any early feedback, or I can wait and provide commentary when the pull request is ready.
hey @jayaddison, thanks for the feedback. I would love to get feedback on the code, I made many assumptions because I wasn't sure on how you like to handle things.
For example I dont really know if you prefer to have functions raise exceptions or if issues should be handled gracefully for example by returning 0 for a yield or servings. Let me know whatever needs to be changed or how you prefer things.
Cool - yep, we should document these things as developer guidance.
The items I'm thinking about from a first-pass are:
Scrapers shouldn't expose public methods that aren't available on other scrapers, so the is_article
method should be considered private/internal (suggestion: rename to _is_article
)
Returning 0
can be a bit of a code smell - so in the yields and servings cases you mention, I reckon that None
is probably what we want (so the scraper responds with "empty timing/servings" instead of "recipe requires zero time / produces zero servings")
We provide settings for callers to decide how they want exceptions to be handled, so -- unless there's a good reason -- we probably shouldn't catch and return a different value (as in this lecker.de
example)
(also a tiny nitpick: typo here)
awesome, thanks for the feedback. All of that makes perfect sense and i will change everything accordingly!
Hey @smilerz - @vabene1111 took a look at this, and unfortunately my response after learning more about the issue is that I'm not convinced that this category of articles on finedininglovers is practical to support. Maybe that's misguided, I'm not sure - feel free to disagree; #612 includes some discussion and the conclusion so far.
At the moment I feel like there's a bunch of work to do to simplify and streamline the scraper development process (and maintenance and review process). It's not bad at the moment but it'd be nice to support a more rapid stream of changes and contributions with higher quality and testing -- and my sense is that doing that is going to require restructuring the internals of the library a bit.
Which means I'm trying harder (but not always managing) to catch changes that might make it harder to do that restructuring.
Closing for now, but further feedback and help appreciated!
Pre-filing checks
The URL of the recipe(s) that are not being scraped correctly
The version of Python you're using
3.9.1
The operating system of your environment
Windows 10
The results you expect to see
non-empty results
The results (including any Python error messages) that you are seeing scrape.schema.data returns {}
Can you write Python and would you like to help fix the scraper yourself? We'd be glad for your assistance! We can provide you with guidance and code review in return. If so, tick any of the relevant boxes below:
recipe-scrapers
team try to fix this