nytimes / ingredient-phrase-tagger

Extract structured data from ingredient phrases using conditional random fields
http://open.blogs.nytimes.com/2016/04/27/structured-ingredients-data-tagging/
Other
785 stars 237 forks source link

reusable/installable package #4

Closed wrboyce closed 8 years ago

wrboyce commented 8 years ago

This PR aims to allow this library to be installed via pip/easy_install and exposes the functionality of parse-ingredients.py and convert-to-json.py as part of said library.

adammck commented 8 years ago

Thanks for the patch! This layout is definitely more conventional. But I don't understand why anyone would want to pip install this thing. Can you give me some context? What's your use-case?

wrboyce commented 8 years ago

I decided to download/index BBC Food after the recent noises that we might lose it (which it now seems are unsubstantiated). Using this tagger seems to be the obvious choice to enhance the data I'd scraped. These recipes are very much UK centric so some new training was required, but I still ended up using a lot of the functionality in this repo. Most notable the import/export functions I refactored from the scripts, and parseNumbers/cleanUnicodeFractions.

wrboyce commented 8 years ago

Oh, and also this allowed me to easily run the code through 2to3 via setup.py; the project I was using it from was using Python 3.

adammck commented 8 years ago

Sorry about the delay getting back to you on this. Works for me, and looks much better. Thank you!