torralba-lab / im2recipe

Code supporting the CVPR 2017 paper "Learning Cross-modal Embeddings for Cooking Recipes and Food Images"
MIT License
365 stars 89 forks source link

How did you Ingredient detection? #24

Closed tomoki-oke closed 5 years ago

tomoki-oke commented 5 years ago

Hello. In the paper where this model was proposed, ingredients were detected through an LSTM model. I want to know how did you train the LSTM model, or what model you chose.

nhynes commented 5 years ago

A few of the data sources had (somewhat roughly) included ingredient annotations. These weren't released as part of the full dataset because they were too inconsistently present.

The model is dead simple: roll up the words using an LSTM and classify the ingredient as one of the top N (where N = O(100), IIRC).

If you want to train your own, you might have good luck with a recipe site that follows the Recipe schema or applies some special markup to their ingredients.

devanshbatra04 commented 5 years ago

Hi, I am trying to reproduce this approach for a similar dataset.

I am really sorry but I don't understand what's meant by rolling up the words using the LSTM. What was the input like? I am guessing there was an embedding layer too.

The paper mentioned the bi-directional lstm that classifies the ingredients text word by word. But that wasn't too clear. :/

Would really appreciate any help at all. Thanks!