strangetom / ingredient-parser

A tool to parse recipe ingredients into structured data
https://ingredient-parser.readthedocs.io/en/latest/
MIT License
73 stars 12 forks source link

Alternative ingredients #27

Open icaliman opened 2 weeks ago

icaliman commented 2 weeks ago

Thanks for working on this library!

I want to ask a question about alternative ingredients.

In the documentation you have these examples:

Is it possible to change the behavior and return a list with ingredients, something similar to CompositeIngredientAmount?

For example 3 tablespoons butter or olive oil, or a mixture can be split into two alternative ingredients:

AlternativeIngredient(
    ingredients=[
        ParsedIngredient(
            name=IngredientText(text='butter', ...),
            ...
        ),
        ParsedIngredient(
            name=IngredientText(text='olive oil', ...),
            ...
        ),
    ]
)

What do you think about this?

strangetom commented 2 weeks ago

Hi @icaliman

Thanks for the suggestion. This is something I've been thinking about for a while now but haven't got round to.

I think the two examples you used show two cases:

  1. 3 tablespoons butter or olive oil, or a mixture has multiple ingredients with the same amounts. We could return something like
ParsedIngredient(
    name=AlternativeIngredients(ingredients=[
        IngredientText(text="butter", ...),
        IngredientText(text="olve oil", ...),
    ]),
    ...
)

This should be fairly straightforward to do.

  1. 4 shoots spring shallots or 4 shallots, minced is a bit more complex because we would need to return two ParsedIngredient objects, as you show in your example. I'm not sure how to do this yet.
strangetom commented 1 week ago

This is proving to be more difficult than I first thought. The problem is working out how to handle sentences where the name of the ingredient is split.

3 tablespoons butter or olive oil, or a mixture

This sentence is straightforward to handle by splitting on the conjunction or.

1 large red or yellow pepper

This kind of sentence is a more difficult because I think we would want the output to be red pepper and yellow pepper. We could use the fact that red and yellow are adjectives and pepper is a noun to work out how to combine red and pepper, but this only works in a few cases e.g.

1 cup beef or chicken stock

Again, we would want to extract beef stock and chicken stock, but all the tokens are nouns which doesn't help much.

This problem is the same as one of the limitations of the foundation foods functionality, so the same solution should work for both. I have an idea for changing the model used for the foundation foods to work for this too, but it involves changing the token labelling scheme which is going to be very time consuming.