Open Danit2 opened 8 months ago
Hi @Danit2 - thank you for the bugreport, we should be able to investgate this soon.
There are two details that would be helpful to narrow this down, if available:
mealie
indicate the verson of recipe-scrapers
that is in use? (I would guess it will look something like v14.50.1
or similar)Failed to retrieve recipe title
or similar)Thanks!
Hi @jayaddison
Thanks for your answer.
My version of Mealie
use the recipe-scrapers
version 14.55.0
On the Logs i don't see anything. I'am Sorry.
INFO: 17-Mar-24 14:57:48 HTTP Request: GET https://www.bettybossi.ch/de/Rezept/ShowRezept/BB_BBZI201015_0003A-40-de?title=Steinpilz-Risotto "HTTP/1.1 200 OK"
INFO: 17-Mar-24 14:57:48 HTTP Request: GET https://www.bettybossi.ch/de/Rezept/ShowRezept/BB_BBZI201015_0003A-40-de?title=Steinpilz-Risotto "HTTP/1.1 200 OK"
127.0.0.1:38902 - "POST /api/recipes/create-url HTTP/1.1" 400
[17/Mar/2024:14:57:48 +0100] 400 164.14.140.15, 172.30.33.17(172.30.32.1) POST /api/recipes/create-url HTTP/1.1 (Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36)
I only become this error message.
I hope you can help.
Thanks.
Extremely helpful, thank you @Danit2 - I hope to investigate this within the next day or so.
Ok, this is an interesting bug. I think what is happening here is that:
httpx
) - like many/most Python HTTP clients - does not evaluate the JavaScript code, so the tiny HTML (with no recipe content) is returned.recipe-scrapers
received the tiny HTML page and doesn't find the recipe information in there.My guess is that if a user-agent followed the redirect to get to the recipe URL, and downloaded the HTML from that second page, then recipe-scrapers
would be able to extract the recipe metadata.
I'll have to spend a bit of time to think about this. It could be good to double-check this theory, too, if anyone out there has time to help.
I would be willing to help solving the problem with Betty Bossi, though I am not a developer.
@Zwirbel1 if you have time, then if you could check whether any open source recipe management / import utilities are able to handle BettyBossi could be useful info for this, to get an idea for whether the same problem has been solved elsewhere (and perhaps how).
I have tested it last week with Tandoor, which was able to import a recipe from Betty Bossy in the demo version online: https://docs.tandoor.dev/. Here's the menu I have imported into the demo version of Tandoor: https://app.tandoor.dev/view/recipe/53071.
It seems like bettybossi.ch uses anti-scraping techniques and as mentioned already in this issue (https://github.com/hhursev/recipe-scrapers/issues/531) you need to reload the page 2 times in order to get the correct HTML.
@SwissOS : I tried reloading the page various times, but I get the same URL and HTML, which does not allow me to import the recipe. Anything else I can change to get the correct HTML / URL?
Pre-filing checks
The URL of the recipe(s) that are not being scraped correctly
The results you expect to see
I use "Mealie" on Home Assistant and there the Betty Bossi Website is not working. In the repositories from Mealie they say this is a problem from the recipe-scrapers
The results (including any Python error messages) that you are seeing
I become a error message from Mealie.