openculinary / knowledge-graph

The RecipeRadar knowledge graph stores and provides access to recipe and ingredient relationship information.
GNU Affero General Public License v3.0
10 stars 0 forks source link

Incorporate ingredient nutritional information from ingreedy-data #48

Closed jayaddison closed 4 years ago

jayaddison commented 4 years ago

Describe the reason for these changes and the problem that they solve

This changeset adds a reference to the ingreedy-data dataset, which contains consolidated nutritional information from multiple sources including the UK CoFID (aka 'McCance') and the US FoodData Central databases.

In order to perform matching between the RecipeRadar dataset and ingreedy-data, we use the normalized_name field from the latter's consolidated JSON format and build a search index using the names as documents. For each named ingredient in RecipeRadar, a query is performed on the search index and the best-matching result is selected.

If no matches are found and the RecipeRadar ingredient has a 'parent' (i.e. tofu is the parent of firm tofu), then nutritional information from the parent element is used as a fallback where present.

Matching accuracy hasn't yet been scrutinized or quantified and this algorithm will likely require further development and improvements.

Briefly summarize the changes

  1. Add a git submodule reference to ingreedy-data
  2. Weave data together from each of the root, McCance and FDC ingreedy-data JSON files
  3. Build a search index and perform query-based ingredient nutrition matching
  4. Update the hierarchy.json output document to include nutritional information

How have the changes been tested?

  1. Manual testing and inspection

List any issues that this change relates to Relates to the RecipeRadar Q3 2020 roadmap.

cc @tomwhite