Closed vjaykoogu closed 5 years ago
@vjaykoogu thanks for the suggestion!
I am open to being convinced otherwise, but I think this goes beyond the scope of a simple scraper.
The problem is that few, if any, of these sites have any sort of structured markup related to the various parts of an ingredient line.
If we can't reason about the structure of an ingredient line from the markup provided, we end up having to write a complete ingredient parser.
Unfortunately there is no real standard for ingredients - This SO answer does a pretty good job covering some of the complexity you might find.
With that in mind, I think this would be better suited for an entirely separate library.
In fact, the New York Times has put out a pretty solid looking python tool for just this purpose:
https://github.com/NYTimes/ingredient-phrase-tagger
And a corresponding blog post with a little background information:
How about breaking #ingredients sections into below Ex:
array(4) { ["quantity"]=> string(1) "1" ["unit"]=> string(3) "lb." ["info"]=> string(19) "peeled and deveined" ["name"]=> string(6) "shrimp" }