hhursev / recipe-scrapers

Python package for scraping recipes data
MIT License
1.61k stars 506 forks source link

Ohsheglows scraping issue #417

Closed 2fst4u closed 2 years ago

2fst4u commented 2 years ago

Pre-filing checks

The URL of the recipe(s) that are not being scraped correctly

The version of Python you're using

3.9.6

The operating system of your environment

Ubuntu 20.04.2 LTS (docker)

The results you expect to see

recipe scraped successfully

The results (including any Python error messages) that you are seeing

Looks Like We Couldn't Find Anything
Only websites containing ld+json or microdata can be imported by Mealie. Most major recipe websites support this data structure. If your site cannot be imported but there is json data in the log, please submit a github issue with the URL and data.

Google ld+json Info
GitHub Issues
Recipe Markup Specification

However the output data shows the following:

[

    "@context": "http://schema.org/",
    "@type": "Recipe",
    "@id": "#recipe_2656",
    "mainEntityOfPage": "true",
    "name": "Energizing Broccoli Dal",
    "image": [
      "https://ohsheglows.com/gs_images/2016/04/20160211-App-Entrees-01837-1.jpg"
    ],
    "author": {
      "@type": "Person",
      "name": "Angela Liddon"
    },
    "datePublished": "2016-02-28 16:03:09",
    "description": "This unique spin on dal packs in a ton of green broccoli power! You'll feel energized by this light, and incredibly flavourful dish. This recipe is inspired by the Vegan Yum Yum Cookbook by Lauren Ulm.",
    "prepTime": "PT20M",
    "cookTime": "PT25M",
    "totalTime": "PT45M",
    "keywords": "Vegan, Gluten-Free, Grain-Free, Nut-Free, Refined Sugar-Free, Soy-Free, Budget Friendly, Freezer Friendly, Quick & Easy",
    "recipeYield": "5 cups (1250 mL)",
    "recipeCategory": [
      "Vegan",
      "Curry"
    ],
    "recipeCuisine": [
      "Indian"
    ],
    "nutrition": {
      "@type": "NutritionInformation",
      "servingSize": "1 cup (250 mL)",
      "calories": "300 calorie",
      "fatContent": "11 grams",
      "saturatedFatContent": "4 grams",
      "sodiumContent": "500 milligrams",
      "carbohydrateContent": "39 grams",
      "fiberContent": "15 grams",
      "sugarContent": "5 grams",
      "proteinContent": "13 grams"
    },
    "suitableForDiet": [
      "https://schema.org/GlutenFreeDiet",
      "https://schema.org/VeganDiet",
      "https://schema.org/VegetarianDiet"
    ],
    "recipeIngredient": [
      "2 tablespoons (30 mL) extra-virgin olive oil",
      "4 to 5 cups chopped fresh broccoli florets (from 1 large head)",
      "1 teaspoon ground cumin",
      "2 teaspoons mustard seeds",
      "1 medium sweet onion, chopped",
      "2 cups (500 mL) low-sodium vegetable broth",
      "1 (14-ounce/398 mL) can light coconut milk",
      "1 cup uncooked red lentils",
      "Fine sea salt, to taste (I used just over 1 teaspoon)",
      "1 teaspoon garam masala spice mix (I love Arvinda's brand)",
      "1 teaspoon ground turmeric",
      "1 teaspoon red pepper flakes",
      "1 1/2 to 2 tablespoons (22 to 30 mL) fresh lemon juice, to taste",
      "Paprika, to garnish",
      "Naan bread, for serving",
      "Lemon wedges, for serving"
    ],
    "recipeInstructions": [
      {
        "@type": "HowToStep",
        "text": "Add the oil into a large saucepan and increase the heat to low-medium so the oil can preheat."
      },
      {
        "@type": "HowToStep",
        "text": "Add the broccoli into a food processor and process until very finely chopped (about the size of rice grains)."
      },
      {
        "@type": "HowToStep",
        "text": "Add the cumin and mustard seeds into the saucepan with the oil. The seeds should sizzle and begin popping in the oil right away. As soon as the seeds start to pop, cover the saucepan immediately with a lid, and remove the saucepan from the heat source. Let sit until the seeds stop popping, then remove the lid and stir."
      },
      {
        "@type": "HowToStep",
        "text": "Stir in the onion, and place the saucepan back over medium heat. Cook for 3 to 4 more minutes until the onion softens."
      },
      {
        "@type": "HowToStep",
        "text": "Now, add the finely chopped broccoli, broth, entire can of coconut milk, lentils, 1/2 teaspoon salt, garam masala, turmeric, and red pepper flakes. Stir to combine."
      },
      {
        "@type": "HowToStep",
        "text": "Increase heat to high and bring to a low bowl. Reduce heat to medium, cover with lid, and simmer for about 10 minutes. Now remove the lid and stir, and continue simmering, uncovered, for another 10 minutes, until the lentils soften and the mixture thickens. Stir every now and then to make sure it doesn't stick."
      },
      {
        "@type": "HowToStep",
        "text": "Stir in the lemon juice to taste."
      },
      {
        "@type": "HowToStep",
        "text": "Ladle the dal into bowls and serve with naan bread or pita bread. Garnish with paprika and a lemon wedge, if desired."
      }
    ],
    "aggregateRating": {
      "@type": "AggregateRating",
      "ratingValue": "5.0000",
      "ratingCount": "6"
    },
    "interactionStatistic": {
      "@type": "InteractionCounter",
      "interactionType": "http://schema.org/Comment",
      "userInteractionCount": "176"
    },
    "publisher": {
      "@type": "Organization",
      "name": "Oh She Glows",
      "logo": {
        "@type": "ImageObject",
        "url": "https://ohsheglows.com/osg_logo-512x512.png"
      },
      "url": "https://ohsheglows.com"
    },
    "url": "https://ohsheglows.com/2011/01/06/energizing-spicy-broccoli-dal/"
  }
]

I'm not a whizz at all but it looks like it's picking up everything properly. Any help appreciated!

Can you write Python and would you like to help fix the scraper yourself? We'd be glad for your assistance! We can provide you with guidance and code review in return. If so, tick any of the relevant boxes below:

micahcochran commented 2 years ago

I'm using version 13.3.5 and it seemed to parse this site ok.

Can you provide the version number for recipe-scrapers?

Put this into the Python console.

>>> import pkg_resources  
>>> pkg_resources.require("recipe_scrapers")[0].version

OR on a Linux/Mac command line put this in (you might need to use pip3):

$ pip list | grep recipe-scrapers

What is the answer that comes out?

As an aside: There should be a better way to get this information from a package.

2fst4u commented 2 years ago

I'm not too sure how to get this from a docker container.

On Mon, 9 Aug 2021, 01:23 Micah Cochran, @.***> wrote:

I'm using version 13.3.5 and it seemed to parse this site ok.

Can you provide the version number for recipe-scrapers?

Put this into the Python console.

import pkg_resources >>> pkg_resources.require("recipe_scrapers")[0].version

OR on a Linux/Mac command line put this in (you might need to use pip3):

$ pip list | grep recipe-scrapers

What is the answer that comes out?

As an aside: There should be a better way to get this information from a package.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/hhursev/recipe-scrapers/issues/417#issuecomment-894797213, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADUNCRW6QRTQU3INVJVLKGTT32ALLANCNFSM5A7LOCYQ .

micahcochran commented 2 years ago

What version of Mealie do you have installed?

It may just be that Mealie needs to upgrade to pin to newer version of recipe-scrapers in their pyproject.toml (that file pins a minimum package version). Has this issue been reported to the Mealie project?

2fst4u commented 2 years ago

I haven't posted an issue to the main mealie repo, no. I thought this would be the more appropriate place.

Unfortunately I can't upgrade due to a standing issue that has since been closed where upgrades to containers break and can't be used without a clean install and restore from config.

If it works on your end then I guess it's fine. It's just this one time that it had an issue but I haven't tried others from the same site so if it crops up again I'll look into it further.

Thanks!