hhursev / recipe-scrapers

Python package for scraping recipes data
MIT License
1.75k stars 534 forks source link

Kitchenstories scrapper not detected #1261

Open hhopke opened 2 months ago

hhopke commented 2 months ago

Pre-filing checks

The URL of the recipe(s) that are not being scraped correctly

"https://www.kitchenstories.com/de/rezepte/susskartoffel-curry"

The results you expect to see

Scrapped recipe

The results (including any Python error messages) that you are seeing

url = "https://www.kitchenstories.com/de/rezepte/susskartoffel-curry" name = input('What is your name, risotto sampler?\n') html = requests.get(url, headers={"User-Agent": f"Risotto Sampler {name}"}).content scraper = scrape_html(html, org_url=url, wild_mode=False) scraper.host() scraper.title() scraper.total_time() scraper.image() scraper.ingredients() scraper.ingredient_groups() scraper.instructions() scraper.instructions_list() scraper.yields() scraper.to_json() scraper.links() scraper.nutrients() # not always available scraper.canonical_url() # not always available scraper.equipment() # not always available scraper.cooking_method() # not always available scraper.keywords() # not always available scraper.dietary_restrictions() # not always available

Traceback (most recent call last): File "...\scratches\scratch_7.py", line 11, in <module> scraper.title() File "~\recipe_scrapers\plugins\exception_handling.py", line 63, in decorated_method_wrapper return decorated(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File ~\recipe_scrapers\plugins\html_tags_stripper.py", line 74, in decorated_method_wrapper decorated_func_result = decorated(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "~\recipe_scrapers\plugins\normalize_string.py", line 33, in decorated_method_wrapper return normalize_string(decorated(self, *args, **kwargs)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "~\recipe_scrapers\plugins\schemaorg_fill.py", line 66, in decorated_method_wrapper raise e File "~\recipe_scrapers\plugins\schemaorg_fill.py", line 57, in decorated_method_wrapper return decorated(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "~\recipe_scrapers\_abstract.py", line 95, in title raise NotImplementedError("This should be implemented.") NotImplementedError: This should be implemented.

jayaddison commented 1 month ago

Hi @hhopke - thank you for the bugreport! I haven't been able to replicate this problem locally; could you check whether there any of the differences in the code I used below seemed different to yours?

>>> import requests
>>> from recipe_scrapers import scrape_html
>>> url = "https://www.kitchenstories.com/de/rezepte/susskartoffel-curry"
>>> name = input('What is your name, risotto sampler?\n')
What is your name, risotto sampler?
James
>>> html = requests.get(url, headers={"User-Agent": f"Risotto Sampler {name}"}).content
>>> scraper = scrape_html(html, org_url=url, wild_mode=False)
>>> scraper.title()
'Süßkartoffel-Curry'
hhopke commented 1 month ago

Hi @jayaddison, I was on vacation, therefore the late reply. I used the exactly same code. Just tried to copy and paste with yours and get the same output. Interesting though is that I am getting this for multiple sites, like if the page is blocking me.

For instance this page worked: https://fitmencook.com/recipes/mexican-tortilla-soup/

jayaddison commented 3 weeks ago

@hhopke no problem at all, thanks for responding. I have one idea, although it may be something you've already considered: do you know whether the relevant pages display as expected when opened in a popular web browser? That could provide one item of information, and perhaps a workaround:

Unfortunately there's often not a lot we can do about transient network errors and network/server filtering -- so I can't guarantee a successful result; but if the page does load in other browsers then, in theory at least, we have more options.