Closed jayaddison closed 1 year ago
Recreating this issue for test/fix development purposes is currently blocked by openculinary/backend#65.
Issue resolved.
Note: this also relates to https://github.com/jaraco/inflect/pull/124 (I guess most/all affected recipes haven't been re-indexed since then, so a fix requirement here is to do that by using the reindexing scripts from the crawler
repository)
Hmm. Some findings:
butter
as an ingredient, the relevant product
field in the JSON response contains more-or-less correct results (bearing in mind that butter
is an uncountable noun and looks the same in singular and plural form):# contact the 'crawler' microservice via the kubernetes ingress and POST a URL to crawl
$ curl -XPOST -H "Host: crawler" "http://192.168.100.1:30080/crawl" --data "url=https://www.recipetineats.com/creamy-garlic-prawn-pasta/"
...
"product": {
"id": ...,
"is_plural": true, # unusual but acceptable; the noun 'butter' is considered uncountable here
"plural": "butter",
"product": "butter",
"product_parser": "knowledge-graph",
"singular": "butter"
},
...
kubectl logs -f deployments/backend-deployment-worker
), the recipe_ingredients
database table still contains purely pluralized data for butter
, and with the unexpected butters
plural form:SELECT ri.product_is_plural, pn.singular, pn.plural, count(*)
FROM recipe_ingredients AS ri
JOIN product_names AS pn ON pn.id = ri.product_name_id
WHERE pn.singular = 'butter'
GROUP BY ri.product_is_plural, pn.singular, pn.plural;
...
product_is_plural | singular | plural | count
-------------------+----------+---------+-------
t | butter | butters | 18462
(1 row)
D'oh: this is all probably a result of the fact that the products
table was denormalized into separate products
and product_names
tables (see #57), with the latter table available from the admin management UI to edit product naming.
Reindexing (a lighter operation than recrawling; retrieving the recipe from the database and formatting the results and writing them to the search engine index) will be required, but in this case the fix is to update the product_names
to correct the plural form there before that.
In other words:
recipe_ingredients
table), and each of those references a single 'product name ID' (product_names
table).butters
is being loaded from the product_names.plural
column.product_names.plural
value to butter
.We could also consider adding a feature to the product admin interface code to help identify cases where the inflect
library doesn't agree with the RecipeRadar singular/plural forms. That could help fix problems on both sides.
Ok, recipes are reindexing at the moment, and at a rate of approximately 100 recipes per second (seems reasonable) across four pods.
The reindexing command was:
# gather recipes that have an ingredient line containing the substring 'butter' and reindex them
$ python recipes.py --where "exists (select * FROM recipe_ingredients AS ri WHERE ri.recipe_id = recipes.id AND ri.description ILIKE '%butter%')" --reindex
Reindexing is complete, and the problem is resolved:
Time for some food.
Describe the bug Currently, an ingredient line such as
50g unsalted butter
identifiesbutter
as the product (correct in this case) but sets theis_plural
flag toTrue
(incorrect in this case).This currently appears to affect every instance of
butter
in parsed ingredients, and so the logic in the ingredient autosuggest on the homepage chooses to display the plural form of the product name.This is a bug; in the vast majority of cases, recipe ingredients use the word
butter
(singular), and sois_plural
should beFalse
, and we should display the singular form,butter
in the autosuggest.I think that the relevant section of code that sets the flag is here: https://github.com/openculinary/knowledge-graph/blob/da40346ccecb7348aac519419b52c12597eb7afe/web/models/product.py#L91
To Reproduce Steps to reproduce the behavior:
backend
database:product_is_plural
valuetrue
(may be abbreviated ast
in the PostgreSQL query output)Expected behavior When an ingredient line such as
50g unsalted butter
containing singular-form butter is parsed, theis_plural
flag in the results should beFalse
, and this should be reflected in the entries stored in the database.Screenshots
(note that search does continue to work as expected; this is a display issue but not a search functionality issue)