mealie-recipes / mealie

Mealie is a self hosted recipe manager and meal planner with a RestAPI backend and a reactive frontend application built in Vue for a pleasant user experience for the whole family. Easily add recipes into your database by providing the url and mealie will automatically import the relevant data or add a family recipe with the UI editor
https://docs.mealie.io
GNU Affero General Public License v3.0
6.64k stars 681 forks source link

Recipes are not Importing from cookidoo.co.uk/(Thermomix) #792

Closed aramis-source closed 2 years ago

aramis-source commented 2 years ago

First Check

What is the issue you are experiencing?

Hello, when I'm trying to import any recipe from https://cookidoo.co.uk/ ,all the Ingredients/Nutrition/Preparation (Instruction) are not imported to Mealie and getting the error: Failed to extract rdfa, raises 'str' object has no attribute 'decode'

Deployment

Docker (Linux)

Deployment Details

No response

Mealie Version

v0.5.3

hay-kot commented 2 years ago

Please provide your logs and a corresponding URL where the problem occurs

cadamswaite commented 2 years ago

The site needs a sign up of some sort, and does not show the instructions as part of the json-ld (though most of the other info is available)

{
  "@context": "http://schema.org/",
  "@type": "Recipe",
  "name": "Black and White Cookies",
  "image": "https://assets.tmecosys.com/image/upload/t_web767x639/img/recipe/ras/Assets/47dab5bf-510b-4901-ba2a-9205fb152032/Derivates/ff516033-8665-4f0b-bb9e-3f2b63c0bc1b.jpg",
  "totalTime": "PT2H",
  "cookTime": "PT2H",
  "prepTime": "PT25M",
  "recipeYield": "12 pieces",
  "recipeCategory": [
    "Baking - sweet"
  ],
  "recipeIngredient": [
    "¼ organic lemon",
    "3 ½ oz sugar",
    "3 ½ oz unsalted butter",
    "2 oz homemade vanilla sugar",
    "2 large eggs",
    "2 oz whole milk",
    "7 oz pastry flour",
    "2 oz corn starch",
    "2 tsp baking powder",
    "14 oz confectioners sugar",
    "2 oz water",
    "1 oz freshly squeezed lemon juice",
    "½ tsp natural vanilla extract",
    "1 oz dutch-processed cocoa powder"
  ],
  "nutrition": {
    "@type": "NutritionInformation",
    "calories": "340 kcal",
    "carbohydrateContent": "65 g",
    "fatContent": "8 g",
    "proteinContent": "4 g"
  },
  "inLanguage": "en-GB",
  "author": {
    "@type": "Organization",
    "name": "Vorwerk International & Co. KmG",
    "address": "Wolleraustrasse 11a\n8807 Freienbach\nSwitzerland",
    "url": "https://cookidoo.co.uk"
  },
  "aggregateRating": {
    "@id": "AggregatedRating"
  }
}

This is causing the following check to fail in scrapers.py and so falling back on the opengraph data:

    if instruct and ingredients:
        return scraped_schema

I'm not sure how common this case is, but it might be worth changing to if instruct or ingredients:? Or maybe even time/nutrition info could be enough, thanks to the bulk add functionality of ingredients and instructions

hay-kot commented 2 years ago

I'm not sure how common this case is, but it might be worth changing to if instruct or ingredients:? Or maybe even time/nutrition info could be enough, thanks to the bulk add functionality of ingredients and instructions

This is the route I took with v1. Looks to be working just fine now.

aramis-source commented 2 years ago

Please provide your logs and a corresponding URL where the problem occurs

ERROR: 02-Dec-21 18:44:18   Failed to extract rdfa, raises 'str' object has no attribute 'decode'
Traceback (most recent call last):
  File "/opt/pysetup/.venv/lib/python3.9/site-packages/extruct/_extruct.py", line 108, in extract
    output[syntax] = list(extract(document, base_url=base_url))
  File "/opt/pysetup/.venv/lib/python3.9/site-packages/extruct/rdfa.py", line 154, in extract_items
    jsonld_string = g.serialize(format='json-ld', auto_compact=not expanded).decode('utf-8')
AttributeError: 'str' object has no attribute 'decode'
INFO: 02-Dec-21 18:44:18    Image https://assets.tmecosys.com/image/upload/t_web767x639/img/recipe/ras/Assets/f21f41cd91c8988d5d5e082fd182cc2c/Derivates/f1bb22833e0a1fdfc0e5f63af3ef27ef37f91636.jpg
INFO: 02-Dec-21 18:44:18    Image URL: https://assets.tmecosys.com/image/upload/t_web767x639/img/recipe/ras/Assets/f21f41cd91c8988d5d5e082fd182cc2c/Derivates/f1bb22833e0a1fdfc0e5f63af3ef27ef37f91636.jpg
INFO: 02-Dec-21 18:44:18    File Name Suffix .jpg
/app/data/recipes/caramel-pecan-macarons/images/original.jpg
INFO: 02-Dec-21 18:44:18    original.jpg Minified: 43.97 kB -> 22.78 kB -> 6.92 kB
192.168.100.15:0 - "POST /api/recipes/create-url HTTP/1.1" 201
/app/data/recipes/caramel-pecan-macarons/images/min-original.webp
192.168.100.15:0 - "GET /api/recipes/summary HTTP/1.1" 200
TrueTrue True
 True
192.168.100.15:0 - "GET /api/recipes/caramel-pecan-macarons HTTP/1.1" 200
192.168.100.15:0 - "GET /api/recipes/caramel-pecan-macarons HTTP/1.1" 200
hay-kot commented 2 years ago

Looks to be related to a dependency issue.

This should be resolved in v0.5.4 with the dependency bump. Try updating to the new version of Mealie and see if the issue persists.

Goeste commented 2 years ago

Hi, I think this issue is still open, right? I tried adding receipes from the german cookiedoo page but it is not possible.

<script type="application/ld+json">{"@context":"http://schema.org/","@type":"Recipe","name":"Hähnchen-Patties","image":"https://assets.tmecosys.com/image/upload/t_web767x639/img/recipe/ras/Assets/E736EFDF-A5D8-49C4-BF08-001221929B92/Derivates/C94DF307-5498-4BD0-9CC4-611DEBCACF7F.jpg","totalTime":"PT30M","cookTime":"PT30M","prepTime":"PT30M","recipeYield":"6 Stück","recipeCategory":["Hauptgerichte mit Fleisch"],"recipeIngredient":["150 g Cornflakes","30 g Sesam","600 g Hähnchenbrustfilets","6 Scheiben Toastbrot","100 g Frischkäse","100 g Milch","1 TL Salz","&frac12; TL Pfeffer","&frac14; TL Paprika edelsüß","2 Eier"," Öl zum Braten"],"nutrition":{"@type":"NutritionInformation","calories":"395 kcal","carbohydrateContent":"28 g","fatContent":"17 g","proteinContent":"31 g"},"recipeInstructions":[{"@type":"HowToStep","text":"Cornflakes und Sesam in den Mixtopf geben, <nobr>10 Sek./Stufe 5</nobr> vermischen und in eine breite Schale umfüllen."},{"@type":"HowToStep","text":"Hähnchenbrustfilets, Toast, Frischkäse, Milch, Salz, Pfeffer und Paprika in den Mixtopf geben und <nobr>15 Sek./Stufe 7</nobr> zerkleinern. Fleischmischung aus dem Mixtopf nehmen, mit angefeuchteten Händen 6 flache Patties formen, jedes Patty in verquirltem Ei und in der Cornflakes-Sesam-Mischung wenden. Patties in einer Pfanne mit Öl goldbraun braten, auf Küchenkrepp abtropfen lassen und nach Wunsch mit Salat und Sauce in einem Burgerbrötchen servieren."}],"keywords":"Hauptgericht, Mittagessen, Braten, Abendessen, Frühling, Sommer, Herbst, amerikanisch, Winter, Studentenküche, Kinder in der Küche, Alltag, alkoholfrei","inLanguage":"de-DE","author":{"@type":"Organization","name":"Vorwerk International & Co. KmG","address":"Wolleraustrasse 11a\n8807 Freienbach\nSchweiz","url":"https://cookidoo.de"},"aggregateRating":{"@id":"AggregatedRating"}}</script>

is available, but is not catched by Mealie: "recipe_scrapers was unable to scrape this URL"

I installed the latest version, just today.

URL is https://cookidoo.de/recipes/recipe/de-DE/r581413 (within a members area) Best, goeste

Goeste commented 2 years ago

Sorry @hay-kot but this is actually not fixed, not in the nightlys nor in v1. the how to steps are not being scraped... or am i missing somthing?! can the scraper be updated separately?

Goeste commented 2 years ago

Sorry @hay-kot but this is actually not fixed, not in the nightlys nor in v1. the how to steps are not being scraped... or am i missing somthing?! can the scraper be updated separately?

oh okay, i see... they might have wrongly used the "HowToStep" attribute...

Goeste commented 1 year ago

Hey, pulling this up again.

Today, I found the following in the code:

<div id="preparation-steps" class="preparation-steps">
            <core-list-section>
            <h3 id="preparation-steps-title">Zubereitung</h3>
              <ol>
                  <li id="preparation-step--0-0">700 g Wasser und Salz in den Mixtopf geben, Varoma-Behälter aufsetzen, Kartoffeln einwiegen, Varoma verschließen, <nobr>33 Min./Varoma/Stufe 1</nobr> (siehe Tipp) garen. Varoma absetzen und die Kartoffeln und etwas abkühlen lassen. In dieser Zeit Mixtopf leeren, kalt ausspülen und mit dem Rezept fortfahren. </li>
                  <li id="preparation-step--0-1">Zwiebel in den Mixtopf geben, <nobr>4 Sek./Stufe 5.5</nobr> zerkleinern und mit dem Spatel nach unten schieben. </li>
                  <li id="preparation-step--0-2">Katenschinken und Butter zugeben und <nobr>3 Min./100°C//Stufe 1</nobr> dünsten.</li>
                  <li id="preparation-step--0-3">Mehl zugeben und <nobr>1 Min./100°C//Stufe 1</nobr> dünsten.</li>
                  <li id="preparation-step--0-4">200 g Wasser, Milch, Gewürzpaste und Pfeffer zugeben und <nobr>6 Min./90°C//Stufe 3</nobr> erhitzen. In dieser Zeit die Kartoffeln pellen und in Scheiben schneiden. </li>
                  <li id="preparation-step--0-5">Kartoffeln in den Mixtopf geben, mit dem Spatel unterrühren und abschmecken. Béchamelkartoffeln in eine Servierschüssel füllen, 1 Minute durchziehen lassen, nach Wunsch mit Schnittlauch bestreuen und servieren. </li>
              </ol>
            </core-list-section>
          </div>

could the id="preparation- be useed to find the starting pint and the respective steps if not listed correctly in the json?