openfoodfacts / openfoodfacts-server

Open Food Facts database, API server and web interface - šŸŖšŸ¦‹ Perl, CSS and JS coders welcome šŸ˜Š For helping in Python, see Robotoff or taxonomy-editor
http://openfoodfacts.github.io/openfoodfacts-server/
GNU Affero General Public License v3.0
641 stars 373 forks source link

Respect minimum % in ingredient analysis #8984

Open danielcavanagh opened 1 year ago

danielcavanagh commented 1 year ago

Problem

If a minimum % is specified in the ingredients list on the label (eg. 'hazelnuts (min 36%)'), it is parsed without error but percent, percent_estimate, percent_max, and percent_min are all set to be exactly the % on the label with no account for the minimum

Proposed solution

Only percent_min should be set to the % on the label. percent should indicate that a minimum was specified, and percent_estimate and percent_max should be estimated based on the min % and the other ingredients

Additional context

Eg. https://en.openfoodfacts.org/api/v0/product/9346758004008/hazelnut-chocolate-spread-pana-organic

Currently:

[
  {
    "id": "en:coconut-sugar",
    "percent_estimate": 50,
    "percent_max": 57,
    "percent_min": 43
  },
  {
    "id": "en:hazelnut",
    "percent": 36,
    "percent_estimate": 36,
    "percent_max": 36,
    "percent_min": 36
  },
  {
    "id": "en:cocoa",
    "percent": 7,
    "percent_estimate": 7,
    "percent_max": 7,
    "percent_min": 7
  },
  {
    "id": "en:sunflower-oil",
    "percent_estimate": 3.5,
    "percent_max": 7,
    "percent_min": 0
  },
  {
    "id": "en:sunflower-lecithin",
    "percent_estimate": 3.5,
    "percent_max": 7,
    "percent_min": 0
  }
]

Should be:

[
  {
    "id": "en:coconut-sugar",
    "percent_estimate": 46.5,
    "percent_max": 57,
    "percent_min": 36
  },
  {
    "id": "en:hazelnut",
    "percent": "36-", // a number range to support both min & max. eg. '-1' is max 1%, '20-25' is min 20% max 25%
    "percent_estimate": 41.25,
    "percent_max": 46.5,
    "percent_min": 36
  },
  {
    "id": "en:cocoa",
    "percent": "7-",
    "percent_estimate": 9.63,
    "percent_max": 28,
    "percent_min": 7
  },
  {
    "id": "en:sunflower-oil",
    "percent_estimate": 1.31,
    "percent_max": 7,
    "percent_min": 0
  },
  {
    "id": "en:sunflower-lecithin",
    "percent_estimate": 1.31,
    "percent_max": 7,
    "percent_min": 0
  }
]

Number of products impacted

Minimal

deveshidwivedi commented 6 months ago

Hi! I'd like to work on this, can I? @stephanegigandet

deveshidwivedi commented 6 months ago

I'd be very grateful to get suggestions on how to start working with this one. I have been trying to understand how the different subroutines work to calculate percent_estimate, percent_max, etc. To work towards a resolution, we could add a condition to handle cases where min% is specified, updating the logic to consider it while calculations and not set values equal to percent_min. Looking forward to suggestions and guidance to go ahead, thank you! @stephanegigandet @alexgarel

danielcavanagh commented 6 months ago

Hey @deveshidwivedi

When I first posted this issue I spent some time working out an algorithm that correctly estimates %s based on both min and max. It's in the attached Excel spreadsheet (lines 4-6 & 10 are what you need). At the time I was unable to make it produce invalid %s (unlike the current algorithm) so I think it's pretty robust. Hopefully you find it useful. It should be fairly straightforword to convert to code

open-food-facts.xlsx

The only other thing I would say that isn't in the spreadsheet is that if a min for an ingredient isn't set I think it should default to 0% and if a max isn't set then it should be equal to the previous ingredient's max. I can't remember if this is how it works already but I thought I should note it just in case

Good luck! šŸ™‚