www.finedininglovers.com returns empty strings

smilerz commented 3 years ago

Pre-filing checks

[x] I have searched for open issues that report the same problem
[x] I have checked that the bug affects the latest version of the library

The URL of the recipe(s) that are not being scraped correctly

https://www.finedininglovers.com/article/black-forest-cake-recipe

The version of Python you're using

3.9.1

The operating system of your environment

Windows 10

The results you expect to see

non-empty results

The results (including any Python error messages) that you are seeing scrape.schema.data returns {}

Can you write Python and would you like to help fix the scraper yourself? We'd be glad for your assistance! We can provide you with guidance and code review in return. If so, tick any of the relevant boxes below:

[ ] I'd like to try fixing this scraper myself
[ ] I'd like guidance to help me develop a fix
[x] I'd prefer if the recipe-scrapers team try to fix this

bfcarpio commented 3 years ago

Thank you @smilerz for notifying us. I did some quick testing and came up with some additional notes:

public functions (instructions()) also return empty values
Opening the testhtml file leads to a CSS-less based page.

It seems like this website might have updated.

vabene1111 commented 2 years ago

ok it appears that the above link is an "article" not a recipe (see url schema).

articles dont have many fields recipes have, i will add some try blocks so that articles provide as much data as possible.

PR will follow tomorrow.

jayaddison commented 2 years ago

Nice, thank you for looking into this @vabene1111. Pages that have type Article seem to be a fairly frequent issue.

I have one or two suggestions for the code based on f883dcc25a1d189ee2ed1deec1ab4e8c8cc4e6fa - let me know if you'd appreciate any early feedback, or I can wait and provide commentary when the pull request is ready.

vabene1111 commented 2 years ago

hey @jayaddison, thanks for the feedback. I would love to get feedback on the code, I made many assumptions because I wasn't sure on how you like to handle things.

For example I dont really know if you prefer to have functions raise exceptions or if issues should be handled gracefully for example by returning 0 for a yield or servings. Let me know whatever needs to be changed or how you prefer things.

jayaddison commented 2 years ago

Cool - yep, we should document these things as developer guidance.

The items I'm thinking about from a first-pass are:

Scrapers shouldn't expose public methods that aren't available on other scrapers, so the is_article method should be considered private/internal (suggestion: rename to _is_article)
Returning 0 can be a bit of a code smell - so in the yields and servings cases you mention, I reckon that None is probably what we want (so the scraper responds with "empty timing/servings" instead of "recipe requires zero time / produces zero servings")
We provide settings for callers to decide how they want exceptions to be handled, so -- unless there's a good reason -- we probably shouldn't catch and return a different value (as in this lecker.de example)

(also a tiny nitpick: typo here)

vabene1111 commented 2 years ago

awesome, thanks for the feedback. All of that makes perfect sense and i will change everything accordingly!

jayaddison commented 2 years ago

Hey @smilerz - @vabene1111 took a look at this, and unfortunately my response after learning more about the issue is that I'm not convinced that this category of articles on finedininglovers is practical to support. Maybe that's misguided, I'm not sure - feel free to disagree; #612 includes some discussion and the conclusion so far.

At the moment I feel like there's a bunch of work to do to simplify and streamline the scraper development process (and maintenance and review process). It's not bad at the moment but it'd be nice to support a more rapid stream of changes and contributions with higher quality and testing -- and my sense is that doing that is going to require restructuring the internals of the library a bit.

Which means I'm trying harder (but not always managing) to catch changes that might make it harder to do that restructuring.

Closing for now, but further feedback and help appreciated!

hhursev / recipe-scrapers

www.finedininglovers.com returns empty strings #366