openculinary / knowledge-graph

The RecipeRadar knowledge graph stores and provides access to recipe and ingredient relationship information.
GNU Affero General Public License v3.0
10 stars 0 forks source link

Ingredients containing 'butter' are incorrect marked-up as plural (product becomes 'butters') #74

Closed jayaddison closed 1 year ago

jayaddison commented 1 year ago

Describe the bug Currently, an ingredient line such as 50g unsalted butter identifies butter as the product (correct in this case) but sets the is_plural flag to True (incorrect in this case).

This currently appears to affect every instance of butter in parsed ingredients, and so the logic in the ingredient autosuggest on the homepage chooses to display the plural form of the product name.

This is a bug; in the vast majority of cases, recipe ingredients use the word butter (singular), and so is_plural should be False, and we should display the singular form, butter in the autosuggest.

I think that the relevant section of code that sets the flag is here: https://github.com/openculinary/knowledge-graph/blob/da40346ccecb7348aac519419b52c12597eb7afe/web/models/product.py#L91

To Reproduce Steps to reproduce the behavior:

  1. Query the backend database:
SELECT ri.description, ri.markup, ri.product_is_plural, pn.singular, pn.plural
FROM recipe_ingredients AS ri
JOIN product_names AS pn ON pn.id = ri.product_name_id
WHERE pn.singular = 'butter'
LIMIT 10
  1. Observe that ingredient lines containing singular 'butter' have the product_is_plural value true (may be abbreviated as t in the PostgreSQL query output)

Expected behavior When an ingredient line such as 50g unsalted butter containing singular-form butter is parsed, the is_plural flag in the results should be False, and this should be reflected in the entries stored in the database.

Screenshots image

(note that search does continue to work as expected; this is a display issue but not a search functionality issue)

jayaddison commented 1 year ago

Recreating this issue for test/fix development purposes is currently blocked by openculinary/backend#65.

Issue resolved.

jayaddison commented 1 year ago

Note: this also relates to https://github.com/jaraco/inflect/pull/124 (I guess most/all affected recipes haven't been re-indexed since then, so a fix requirement here is to do that by using the reindexing scripts from the crawler repository)

jayaddison commented 1 year ago

Hmm. Some findings:

# contact the 'crawler' microservice via the kubernetes ingress and POST a URL to crawl
$ curl -XPOST -H "Host: crawler" "http://192.168.100.1:30080/crawl" --data "url=https://www.recipetineats.com/creamy-garlic-prawn-pasta/"
...
    "product": {
      "id": ...,
      "is_plural": true,  # unusual but acceptable; the noun 'butter' is considered uncountable here
      "plural": "butter",
      "product": "butter",
      "product_parser": "knowledge-graph",
      "singular": "butter"
    },
...
SELECT ri.product_is_plural, pn.singular, pn.plural, count(*)
FROM recipe_ingredients AS ri
JOIN product_names AS pn ON pn.id = ri.product_name_id
WHERE pn.singular = 'butter'
GROUP BY ri.product_is_plural, pn.singular, pn.plural;
...
 product_is_plural | singular | plural  | count
-------------------+----------+---------+-------
 t                 | butter   | butters | 18462
(1 row)
jayaddison commented 1 year ago

D'oh: this is all probably a result of the fact that the products table was denormalized into separate products and product_names tables (see #57), with the latter table available from the admin management UI to edit product naming.

Reindexing (a lighter operation than recrawling; retrieving the recipe from the database and formatting the results and writing them to the search engine index) will be required, but in this case the fix is to update the product_names to correct the plural form there before that.

In other words:

We could also consider adding a feature to the product admin interface code to help identify cases where the inflect library doesn't agree with the RecipeRadar singular/plural forms. That could help fix problems on both sides.

jayaddison commented 1 year ago

Ok, recipes are reindexing at the moment, and at a rate of approximately 100 recipes per second (seems reasonable) across four pods.

The reindexing command was:

# gather recipes that have an ingredient line containing the substring 'butter' and reindex them
$ python recipes.py --where "exists (select * FROM recipe_ingredients AS ri WHERE ri.recipe_id = recipes.id AND ri.description ILIKE '%butter%')" --reindex 
jayaddison commented 1 year ago

Reindexing is complete, and the problem is resolved:

image

Time for some food.