openculinary / knowledge-graph

The RecipeRadar knowledge graph stores and provides access to recipe and ingredient relationship information.
GNU Affero General Public License v3.0
10 stars 0 forks source link

'Tuna steak' incorrectly categorized underneath 'steak' in ingredient hierarchy #50

Closed jayaddison closed 4 years ago

jayaddison commented 4 years ago

Describe the bug Because the logic to determine the ingredient hierarchy is currently naive (it is searching simply based on the name of the parent ingredient), tuna steak is being categorized as a sub-ingredient of steak.

This means that a query for steak will return recipes containing tuna steak, and that queries for tuna will not return recipes containing tuna steak.

To Reproduce Steps to reproduce the behavior:

  1. Inspect the ingredient hierarchy JSON
  2. Discover the following entry, and note the parent_id field
  {"product": "tuna steak", "recipe_count": 186, "id": "steak_tuna", "domain": null, "parent_id": "steak", "depth": 1, "nutrition": {"protein": 22.0, "fat": 4.1, "carbohydrates": 0.0, "energy": 125.0, "fibre": null, "product": "steak"}}

Expected behavior The parent_id of steak_tuna should be tuna.

Relates to https://github.com/openculinary/backend/issues/24

jayaddison commented 4 years ago

One possible idea here is to create 'disambiguation expansions'.

steak would map to beef in a disambiguation expansion.

We could then ensure that each set of expansions is anchored to a parent that contains some of the same dis-exp tokens, with a preference for 'original' tokens over ones added at expansion time.

Therefore in the third example listed above, the tuna steak would prefer to be anchored to a root with 'tuna' as a token.

jayaddison commented 4 years ago

Perhaps the existing contents token generation could be re-used for this case?

jayaddison commented 4 years ago

While the suggested expansions approach may work, one drawback it has is that it may become difficult to reason about and debug.

An alternative approach would be to create a 'remappings' file that contains manual overrides for the parent element for specific product IDs. For example, we might want to override product A to have no parent element (i.e. it would become a root product), or we may wish to override product C so that it has product B as a parent.

jayaddison commented 4 years ago

Selectively re-indexed affected recipes using crawler command:

crawler/reciperadar $ pipenv run python recipes.py --where "exists (select * from recipe_ingredients as ri where ri.recipe_id = recipes.id and (ri.description ilike '%tuna steak%' or ri.description ilike '%halibut steak%' or ri.description ilike '%root beer%'))" --recrawl