strangetom / ingredient-parser

A tool to parse recipe ingredients into structured data
https://ingredient-parser.readthedocs.io/en/latest/
MIT License
63 stars 11 forks source link

parsing out metric units when available #2

Closed yqiang closed 1 year ago

yqiang commented 1 year ago

Hi, First of all – thanks for the work on this library, it works quite well and is pleasant to use.

I was wondering if it would be within the scope of this library to attempt to parse out metric units out of an ingredient string when they are available. For example, the following recipe has the following ingredient list:

For the first two and the last ingredient, it would be great to be able to get the metric values in grams.

On a separate note, it's not parsing the quantity description "three 8-ounce packages" correctly. It's parsing it as:

        "name": "cream cheese",
        "quantity": 3.0,
        "unit": "package",
        "original_description": "three 8-ounce packages (681g) cream cheese"

This seems a bit tougher to solve, but wanted to mention it nonetheless.

strangetom commented 1 year ago

Hi @yqiang, thanks for the feedback.

Being able to extract multiple quantities with different units is something I'm planning to do. My current thinking is to train the model to detect primary and secondary/alternative quantities and units. For example, the sentence 1 1/2 cups (297g) granulated sugar would return

{
    "name": "granulated sugar",
    "primary_quantity": "1.5",
    "primary_unit": "cups",
    "secondary_quantity": "297",
    "secondary_unit": "g",
}

I have through about training the model the identify metric and US customary units directly but I'm not sure how well it be able to associate the quantities with the correct unit. I might still look into this at a later date.

For the sentence three 8-ounce packages (681g) cream cheese, the output you are seeing is what I intended (at the moment). My reasoning is that 3 packages is the primary quantity and unit, and the 8-ounce is secondary information.

When I implemented the primary/secondary quantities and units as described above, then the output would be

{
    "name": "cream cheese",
    "primary_quantity": "3",
    "primary_unit": "packages",
    "secondary_quantity": "8",
    "secondary_unit": "ounces",
}
yqiang commented 1 year ago

That sounds great. Looking forward to it!