nytimes / ingredient-phrase-tagger

Extract structured data from ingredient phrases using conditional random fields
http://open.blogs.nytimes.com/2016/04/27/structured-ingredients-data-tagging/
Other
785 stars 237 forks source link

Alternative Units #10

Closed jasonvarga closed 7 years ago

jasonvarga commented 7 years ago

Hi, you guys have created an awesome tool here.

I was wondering if its possible to recognize (or 'teach' it to recognize) alternative units.

For example:

It doesn't have any idea that oz is the same as ounce.

Thanks!

jasonvarga commented 7 years ago

I think you can disregard this. When I create the model using the full dataset instead of only a sample, it seems to get better about it.

It changes from qty: 2, name: oz milk to qty: 2, other: oz, name: milk.
It separates oz from milk now, but thinks it's other instead of unit. It's better.

I think it just needs more data to work with.