onetsp / RecipeParser

A PHP library for parsing structured recipe data from HTML files.
https://onetsp.com/
MIT License
93 stars 26 forks source link

Use with Composer / PSR-4 #10

Closed chiplay closed 9 years ago

chiplay commented 9 years ago

Hey Mike!

First off - thanks so much for open sourcing this awesome library! I'm a dev at relayfoods.com and we are attempting to use it for parsing recipes via a bookmarklet (or pasting the url) to convert a recipe into a shopping list from our product catalog. You can see the service in action here: http://recipe-parser.herokuapp.com/?url=http://allrecipes.com/Recipe/Juicy-Roasted-Chicken/Detail.aspx?evt19=1&referringHubId=662

To get the library running as a service, we took your library and forked it in an attempt to add composer support and PSR-4 formatting for easier consumption by Laravel. The one big downside is that keeping everything in sync is going to be prohibitively difficult.

I wanted to reach out and see if a) you have any thoughts on a better way to leverage your library as-is to be used as a microservice that converts recipe urls into JSON output - or - b) have any desire to add composer / PSR-4 support to your library?

Thanks again!

onetsp commented 9 years ago

Hi Chip,

Thanks for reaching out about this. Really great to see someone making use of this.

I'm not totally clear on what it would take to pull in your changes. Would these make the library not work with standard (non-Laravel) PHP? PSR-4 stuff looks mostly like a new autoloader and namespaces. Any reasons this wouldn't be a relatively easy drop-in for an existing PHP 5.3 app?

To be honest, I may not have much free time to look through this in the next couple of weeks. But I certainly don't want to get to a point where we're both supporting parser updates in forked projects. Seems like a waste of effort.

To answer your first question about running as a microservice, I imagine it'd be pretty easy to write a front script that mimics the functionality in the parse_recipe script, but with the simple addition of encoding the output as json. E.g.:

./parse_recipe http://www.realsimple.com/food-recipes/browse-all-recipes/cottage-pies-recipe

(Which you can also see in the readme.)

Given the work you've already done on this, I can understand that you might not want to go this route due to the investment already made. But that could be pretty simple to setup/support. (Admittedly, I have little context on your environment!)

Mike

chiplay commented 9 years ago

Hey Mike!

Thanks for getting back to me. I went with your advice and switched to a really simple script to load the parser and return json:

header('Content-Type: application/json');
header('Access-Control-Allow-Origin: *');
header('Cache-Control: max-age=86400'); // 1 day

include dirname(dirname(__FILE__)) . '/bootstrap.php';

$recipe_url = $_GET["url"];

// Fetch and cleanup the HTML
$html = FileUtil::downloadPage($recipe_url);
$html = RecipeParser_Text::forceUTF8($html);
$html = RecipeParser_Text::cleanupClippedRecipeHtml($html);

// Parse recipe into a struct
try {
    $recipe = RecipeParser::parse($html, $recipe_url);
} catch (NoMatchingParserException $e) {
    echo 'Error: No matching parser (' . $e->getMessage() . ')\n';
    exit(1);
}

echo json_encode($recipe);

Heroku has issues with the tempnam functions, so not saving / caching the fetched files for now - but hoping I can get that working again soon to keep from hitting the recipe sites too often. Hopefully this will allow us to easily contribute back to your lib and not have to work about keeping a fork in sync!

Thanks again for sharing this awesome project - hopefully our team can help contribute and make your life a bit easier keeping it up to date.

Best, Chip

mikebrittain commented 9 years ago

Hey, Chip. I want to follow up on this. I know you put a decent amount of effort into cleaning up the RecipeParser code for being easier to use with other frameworks and I don't necessarily want to see that go to waste. I would certainly appreciate any ongoing efforts to keep the parsers up to date, rather than these being forked off in separate repos.

Can you shoot me an email at mike@onetsp.com? Want to talk through some details...

Mike