hhursev / recipe-scrapers

Python package for scraping recipes data
MIT License
1.6k stars 505 forks source link

AllRecipes fails to scrape ingredients #1154

Open michael-genson opened 1 week ago

michael-genson commented 1 week ago

Pre-filing checks

The URL of the recipe(s) that are not being scraped correctly

The results you expect to see

...
"recipeIngredient": [
    "1 pound ground beef",
    "1 (15 ounce) can black beans, partially drained",
    "1 cup salsa",
    "0.66666668653488 cup water",
    "1 (1 ounce) package taco seasoning",
    "4 (8 inch) flour tortillas, cut into strips",
    "1 cup shredded Cheddar cheese",
    "1 (8 ounce) carton sour cream, or to taste"
],
...

The results (including any Python error messages) that you are seeing Nothing (empty array).


The problem seems to be the workaround from this PR: https://github.com/hhursev/recipe-scrapers/pull/964

Specifically: https://github.com/hhursev/recipe-scrapers/commit/ac101d7fad8b6cebe78b9588662761f2661f1101

Removing the manual HTML scraping fixes the issue. Based on that PR, it looks like the manual scraping was to bypass an issue with their JSON schema. To fix the workaround, I'm pretty sure all we need to do is switch mntl-structured-ingredients__list-item to mm-recipes-structured-ingredients__list-item (or make it more change-tolerant).

If I have time I'll open a PR, but I wanted to post my findings in case I don't get to it

anguswg-ucsb commented 4 days ago

Thanks for your work @michael-genson! Any idea when approximately this PR will get merged in?

michael-genson commented 4 days ago

No clue, that would be up to the maintainers of recipe scrapers.