nytimes / ingredient-phrase-tagger

Extract structured data from ingredient phrases using conditional random fields
http://open.blogs.nytimes.com/2016/04/27/structured-ingredients-data-tagging/
Other
785 stars 237 forks source link

Possibly Improved Sequence Tagger #9

Open ghost opened 7 years ago

ghost commented 7 years ago

Hi, This is a pretty awesome project, thanks for posting it! I've been experimenting with Structured Prediction methods, and decided to use this project to compare CRFs and Learning to Search methods. Pending a more rigorous evaluation (fingers crossed) I'm seeing roughly 96% per-token accuracy and 95.55% sentence-level accuracy with L2S and vw, taking 22 minutes total for read+train+test. This is an 80/20 split on the full dataset, using the output of bin/generate. Once I've cleaned up the source, I'll be happy to send over a pull request.

- Arthur

tettoffensive commented 6 years ago

@Zintinio did you ever clean up the source? I'm curious to see your improvements. I'm seeing a few things where I'll get "name": "Salt and pepper" instead of two separate ingredients. Wondering if your improvements would help with this sort of problem?

Also, ingredients like "Basil" and "Basil leaves". I think it would be better if they were both recognized as the same ingredient. But that might be much more challenging ;)

ghost commented 6 years ago

Yeah I'll dig it up. It's actually just using Vowpal Wabbit + Dagger, with the same features.

On Fri, Dec 1, 2017 at 2:02 PM Stuart Tett notifications@github.com wrote:

@Zintinio https://github.com/zintinio did you ever clean up the source? I'm curious to see your improvements. I'm seeing a few things where I'll get "name": "Salt and pepper" instead of two separate ingredients. Wondering if your improvements would help with this sort of problem?

Also, ingredients like "Basil" and "Basil leaves". I think it would be better if they were both recognized as the same ingredient. But that might be much more challenging ;)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NYTimes/ingredient-phrase-tagger/issues/9#issuecomment-348580365, or mute the thread https://github.com/notifications/unsubscribe-auth/AAnkEEvrXJUjhUYyzTxHL6VTRWIiaUgzks5s8E1IgaJpZM4KeCLe .