ssnepenthe / recipe-scraper

A library for scraping recipes from popular recipe sites.
GNU General Public License v2.0
47 stars 15 forks source link

Consider extracting ingredients and instructions in groups #35

Open ssnepenthe opened 5 years ago

ssnepenthe commented 5 years ago

For recipes like this: https://www.epicurious.com/recipes/food/views/vanilla-buttermilk-sheet-cake-with-raspberries-and-orange-cream-cheese-frosting

Current behavior is to extract the group headers as standalone ingredients/instructions, i.e.

'ingredients' => [
    'For the Buttermilk Cake:',
    '2 cups cake flour, plus more for pan',
    // ...
    'For the Vanilla Syrup:',
    '1/4 cup granulated sugar',
    // ...
    'For the Orange Cream-Cheese Frosting:',
    '1 (8-ounce) package cream cheese, chilled',
    // ...
    'For the assembly:',
    '4 ounces raspberries',
    // ...
]

But maybe something like this would be preferable:

'ingredients' => [
    [
        'title' => 'For the Buttermilk Cake:',
        'values' => [
            '2 cups cake flour, plus more for pan',
            // ...
        ],
    ],
    [
        'title' => 'For the Vanilla Syrup:',
        'values' => [
            '1/4 cup granulated sugar',
            // ...
        ],
    ],
    // ...
],

And when a site doesn't use groups:

'ingredients' => [
    [
        'title' => null,
        'values' => [
            '2 cups cake flour, plus more for pan',
            // ...
        ],
    ],
],
Mark-Howe commented 3 years ago

Any further thoughts? I'd be happy to push forward with your new format. Wary it's a breaking change, optional 'groupedIngredients' / 'groupedInstructions' could be added instead.

Since the group headers aren't an ingredient, I don't think the headings should be in the flat ingredients list as they are now - leading back to just changing the current format as you've stated.

If a recipe has both ingredients and instructions grouped (not that I've seen it anywhere), it'd be nice to share a value between the two so they're linked. We can expect they'd have the same title, but may not be the case at the source.

BBCGoodFood (#47) split most ingredients into groups. I'm not a fan of the current inline headings.