nsafai / recipe-parser

Natural language parser for recipe ingredients that can also combine ingredients
24 stars 34 forks source link

Fix to undefined fraction denominator #3

Open daenatrillium opened 4 years ago

daenatrillium commented 4 years ago

When a recipe has an ingredient listed as, for example, "1 2 lb. chicken" when it means "One 2 lb chicken," the ingredient parser will assume the white space between the two numbers represents a fraction and then search for a fraction denominator on the "/" split, which returns undefined. It then tries to divide by this undefined denominator and throws an error. This adds a test for existence before doing the division. If the denominator doesn't exist, it defaults to returning the "whole" number (1 in the example), and ignores the rest of the string for quantities (open to other ideas here). Many recipes on common recipe websites, for example Bon Appetit, follow this ingredient style and thus throw errors in the parser.

undermark5 commented 3 years ago

Checking out your branch and trying your update results in the following output

{
  quantity: '1',
  unit: 'pound',
  ingredient: 'chicken',
  minQty: '1',
  maxQty: '1'
}

while some output is arguably better than an error being thrown, I would argue that this output is incorrect. In this case, I would associate the 2 lb. with the chicken, therefore the ingredient is '2 lb. chicken' and the unit is 'ea'. Another such example is 1 12 oz can chicken (in my opinion this is a much more common situation), which the result is

{
  quantity: '1',
  unit: 'ounce',
  ingredient: 'can chicken',
  minQty: '1',
  maxQty: '1'
}

and in this case, the resulting unit should be '12 oz can', not just 'ounce' or can. All of this can be avoided by utilizing parenthesis (eg, '1 can (12 oz) chicken' or '1 (2 lb.) chicken'), though this is not a sufficient workaround as one does not always own the list of ingredients to be able to modify them to be able to parse them correctly.

This is an unfortunate situation that we are presented with to be able to parse these multipart units and it may not be possible to, even Zestful does not "properly" parse these. see their docs.

nsafai commented 3 years ago

@daenatrillium sorry for missing your PR until now - my notifications were unfortunately turned off. Thanks for putting up this PR and bringing up the issue in the first place. I agree this is something we should fix and certainly don’t think we should throw an error for “1 12oz can”

Would be great to add some unit tests for the situations you and @undermark5 brought up (btw thanks for testing @undermark5).

Question: What do you two think of assuming the space implies multiplication?

1 12oz cans: 1 12 = Quantity: 12. Unit: oz 2 24oz cans: 2 24 = Quantity: 48. Unit: oz 1 2lbs chicken: 1 * 2 = Quantity: 2. Unit: lbs

Can you think of examples where this assumption would be problematic?

undermark5 commented 3 years ago

@nsafai the multiplication is an option, however, for things like cans where they are uniform discrete units and doesn't make sense in all situations. In the case of chicken or other things like that, the multiplication would be fine because often that sort of product is sold on a per weight basis and not all of the discrete items are the same weight. I actually have some changes that I've implemented and added some test cases, I'll have to see if I can find the time to make a summary of the changes and do more testing.