onetsp / RecipeParser

A PHP library for parsing structured recipe data from HTML files.
https://onetsp.com/
MIT License
93 stars 26 forks source link

Parsing issues with thekitchn.com #13

Closed chiplay closed 7 years ago

chiplay commented 9 years ago

Doesn't return ingredient list, we'll work on a parser for this soon.

See: http://www.thekitchn.com/how-to-make-steel-cut-oatmeal-in-jars-one-week-of-breakfast-in-5-minutes-cooking-lessons-from-the-kitchn-143623

onetsp commented 9 years ago

The markup for thekitchn is very temperamental. It's probably not one of the parsers that would be straightforward to fix, either. I'm happy to have you look at it. You'll need to add a few extra test cases (i.e. HTML pages to parse against) as the test cases I have at the moment all seem to be passing with current markup.

Thekitchn is amongst the top sites that fail to clip accurately, and I think this is because their markup is so non-standard—and it may vary from page to page, as well. I'm anxious to improve this, and might invest time it myself if you don't. But just know up-front that this probably isn't one of the easier fixes out there.

onetsp commented 7 years ago

TheKitchn parser was updated this week.