Describe the bug
When generating markup for ingredients containing more than one word (i.e. ngrams > 1), the markup engine tends to discard words that appeared towards the end of the input.
For three-word ingredient names, two words are dropped. For two-word ingredient names, one word is dropped. Single-word ingredient names are not affected.
Duplicate words from the ingredient name appear in the place of the dropped words.
For example:
$ curl -H 'Host: knowledge-graph' -XPOST 192.168.100.1:30080/ingredients/query --data 'descriptions[]=large red bell pepper for burritos' | jq
{
"results": {
"large red bell pepper for burritos": {
"ancestors": [
"bell pepper",
"pepper"
],
"category": null,
"contents": [
"red bell pepper"
],
"is_plural": false,
"markup": "large <mark>red bell pepper</mark> bell pepper",
"plural": "red bell peppers",
"product": "red bell pepper",
"singular": "red bell pepper"
}
}
}
To Reproduce
Steps to reproduce the behavior:
Query the knowledge-graph using an ingredient line that contains a multi-word ingredient name
Observe that the end of the markup response field contains incorrect words
Expected behavior
All of the words from the original ingredient description should appear, and the ingredient name should be marked.
Describe the bug When generating markup for ingredients containing more than one word (i.e.
ngrams > 1
), the markup engine tends to discard words that appeared towards the end of the input.For three-word ingredient names, two words are dropped. For two-word ingredient names, one word is dropped. Single-word ingredient names are not affected.
Duplicate words from the ingredient name appear in the place of the dropped words.
For example:
To Reproduce Steps to reproduce the behavior:
markup
response field contains incorrect wordsExpected behavior All of the words from the original ingredient description should appear, and the ingredient name should be marked.