thinkle / gourmet

Gourmet Recipe Manager
GNU General Public License v2.0
339 stars 137 forks source link

problem parsing two-word ingredients that begin with lower-case 'a' #931

Open Stuckyville opened 5 years ago

Stuckyville commented 5 years ago

When entering a two-word ingredient where the first word begins with lower-case 'a', the parser strips the leading 'a' and treats it like a quantity. For example, 'apple juice' becomes 'pple juice' with a quantity of of '1'. Further detail discussed at https://answers.launchpad.net/gourmet/+question/678095

saxon-s commented 5 years ago

Environment: Gourmet 0.17.4 and master branch on Ubuntu and Windows.

Steps to reproduce:

  1. Click "New" button for new recipe
  2. Click "Ingredients" tab
  3. Add each of the following ingredients individually to "Add ingredient" text field: "apple juice" "Apple juice" "apricot" "an avocado" "a beet" "a dozen eggs" "a pair of Yubari King melons"

Expected Results:

Actual Results:

Analysis: If the first word in an ingredient (more than one word) string starts with a lower case "a", the first letter ("a") of the first word is stripped off and substituted with quantity of "1", "a dozen" is substituted with quantity of "12" and "a pair" is substituted with quantity of "2".

Conclusion:

martinp26 commented 4 years ago

There are multiple problems here:

A simple workaround is this in gourmet/convert.py:

@@ -644,7 +648,7 @@ all_number_words.sort( lambda x,y: ((len(y)>len(x) and 1) or (len(x)>len(y) and -1) or 0) )

-NUMBER_WORD_REGEXP = '|'.join(all_number_words).replace(' ','\s+') +NUMBER_WORD_REGEXP = None FRACTION_WORD_REGEXP = '|'.join(filter(lambda n: NUMBER_WORDS[n]<1.0, all_number_words) ).replace(' ','\s+')

I believe the NUMBER_FINDER.finditer(timestring) in timestring_to_seconds should not blindly look for the next num-like match, but only after the non-num words after the last match have been consumed.

"12 Minuten" is currently parsed as [12 Minu] [ten]

saxon-s commented 4 years ago

@martinp26 Thank you for investigating the issue and the simple workaround.