nytimes / ingredient-phrase-tagger

Extract structured data from ingredient phrases using conditional random fields
http://open.blogs.nytimes.com/2016/04/27/structured-ingredients-data-tagging/
Other
785 stars 237 forks source link

refactored singularize() to use SnowballStemmer; fixed the smartJoin() #11

Open ramji-c opened 7 years ago

ramji-c commented 7 years ago

Summary of changes:

  1. refactored singularize() function in utils.py to make use of SnowballStemmer to convert plural words to singular form
  2. fixed the smartJoin() function to use regex for whitespace removal
  3. fixed a minor bug in use of convert_to_json.py, wherein the output of roundtrip.sh script wasn't compatible. The import_data() now checks if confidence score is available during the conversion process