hhursev / recipe-scrapers

Python package for scraping recipes data
MIT License
1.61k stars 505 forks source link

BettyBossi is not working #1028

Open Danit2 opened 3 months ago

Danit2 commented 3 months ago

Pre-filing checks

The URL of the recipe(s) that are not being scraped correctly

The results you expect to see

I use "Mealie" on Home Assistant and there the Betty Bossi Website is not working. In the repositories from Mealie they say this is a problem from the recipe-scrapers

The results (including any Python error messages) that you are seeing

I become a error message from Mealie.

jayaddison commented 3 months ago

Hi @Danit2 - thank you for the bugreport, we should be able to investgate this soon.

There are two details that would be helpful to narrow this down, if available:

Thanks!

Danit2 commented 3 months ago

Hi @jayaddison

Thanks for your answer.

My version of Mealie use the recipe-scrapers version 14.55.0 image

On the Logs i don't see anything. I'am Sorry.

INFO: 17-Mar-24 14:57:48    HTTP Request: GET https://www.bettybossi.ch/de/Rezept/ShowRezept/BB_BBZI201015_0003A-40-de?title=Steinpilz-Risotto "HTTP/1.1 200 OK"
INFO: 17-Mar-24 14:57:48    HTTP Request: GET https://www.bettybossi.ch/de/Rezept/ShowRezept/BB_BBZI201015_0003A-40-de?title=Steinpilz-Risotto "HTTP/1.1 200 OK"
127.0.0.1:38902 - "POST /api/recipes/create-url HTTP/1.1" 400
[17/Mar/2024:14:57:48 +0100] 400 164.14.140.15, 172.30.33.17(172.30.32.1) POST /api/recipes/create-url HTTP/1.1 (Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36)

I only become this error message. image

I hope you can help.

Thanks.

jayaddison commented 3 months ago

Extremely helpful, thank you @Danit2 - I hope to investigate this within the next day or so.

jayaddison commented 3 months ago

Ok, this is an interesting bug. I think what is happening here is that:

My guess is that if a user-agent followed the redirect to get to the recipe URL, and downloaded the HTML from that second page, then recipe-scrapers would be able to extract the recipe metadata.

I'll have to spend a bit of time to think about this. It could be good to double-check this theory, too, if anyone out there has time to help.

Zwirbel1 commented 3 months ago

I would be willing to help solving the problem with Betty Bossi, though I am not a developer.

jayaddison commented 3 months ago

@Zwirbel1 if you have time, then if you could check whether any open source recipe management / import utilities are able to handle BettyBossi could be useful info for this, to get an idea for whether the same problem has been solved elsewhere (and perhaps how).

Zwirbel1 commented 3 months ago

I have tested it last week with Tandoor, which was able to import a recipe from Betty Bossy in the demo version online: https://docs.tandoor.dev/. Here's the menu I have imported into the demo version of Tandoor: https://app.tandoor.dev/view/recipe/53071.

SwissOS commented 2 months ago

It seems like bettybossi.ch uses anti-scraping techniques and as mentioned already in this issue (https://github.com/hhursev/recipe-scrapers/issues/531) you need to reload the page 2 times in order to get the correct HTML.

Zwirbel1 commented 1 month ago

@SwissOS : I tried reloading the page various times, but I get the same URL and HTML, which does not allow me to import the recipe. Anything else I can change to get the correct HTML / URL?