openfoodfacts / openfoodfacts-server

Open Food Facts database, API server and web interface - 🐪🦋 Perl, CSS and JS coders welcome 😊 For helping in Python, see Robotoff or taxonomy-editor
GNU Affero General Public License v3.0
613 stars 358 forks source link

CSV Upload from Open Source Data to OFF Database - PLU codes #7735

Open FoodCoach-App opened 1 year ago

FoodCoach-App commented 1 year ago

Description

Bulk CSV file "add" to database. Specifically looking to upload all PLU codes from here (https://www.ifpsglobal.com/PLU-Codes) with the free to use USDA nutritional information from here (https://fdc.nal.usda.gov/fdc-app.html#/food-details/2346398/nutrients). There are 1423 unique items on this list; I'd hate to have to do this manually.

Acceptance criteria

API or website based CVS reader that would take the CVS file and:

  1. create the new 4-5 digit UPC code items
  2. populate the name for each item
  3. populate the nutrition information for each item
  4. populate an image/image link for the items

What would a demo look like

  1. upload CSV (with proper format or headers)
  2. OFF is updated with information

    Notes

Tasks

  1. create the new 4-5 digit UPC code items
  2. populate the name for each item
  3. populate the nutrition information for each item
  4. populate an image/image link for the items
stephanegigandet commented 1 year ago

That's a very interesting topic. The CSV upload is in fact the easier part: we have that in our platform for producers: https://world.pro.openfoodfacts.org There's a CSV template, and we support custom files too.

It brings a lot of questions though:

  1. Do we create one product in OFF for each PLU number?
  2. Do we use the PLU number as the "code" identifier for those products? (currently we use the 8/12/13 digits EAN/UPC/GTIN as identifiers).
  3. Do we load nutrients data from the USDA database
  4. What about other items from the USDA database that do not have an associated PLU number
  5. Do we do the same with other nutrient databases (e.g. CIQUAL in France)
  6. Is PLU the only numbering scheme for produce?
stephanegigandet commented 1 year ago

In addition, something we could do is to add the PLU numbers to the corresponding categories in the taxonomy category.

FoodCoach-App commented 1 year ago

I'll answer these questions based on how I'd like to use the data for my platform FoodCoach.

  1. Yes, each PLU would be a product in OFF
  2. Yes, the 4 or 5 digit PLU would be the identifier
  3. Yes, There seems to be no restrictions about reuse from the USDA database.
  4. At this time I am not proposing adding other items from the USDA database without PLU's
  5. We can use any database that has nutritional information based on 4 or 5 digit PLU code. If the France database is deemed better, that could work too.
  6. PLU appears to the the only number scheming for produce. 4 digits for "traditional" produce/commodities and 5 digits for "organic" produce/commodities
  7. Adding the PLU in the taxonomy category would be interesting, but for my application, I'd like to be able to return the nutrition information for 200 grams of a 4015 PLU code apple just as easily as a 369 g of 0038000198519 "Apple Jacks" cereal and with the same "GET" API method.

Further Recommendation a. I'd recommend that that the "serving size" be "one" of the produce item (one apple, one onion, one melon) and then grams / 100 grams be filled out as normal. The product size would be the average mass of a single apple, onion or melon.

teolemon commented 1 year ago
FoodCoach-App commented 1 year ago

I would be happy to upload this data as described myself, and would only need a corporate account. I would like to ensure that what I have described is reasonable, consistent with the goals of OFF and that the data format makes sense.

On Thu, Nov 24, 2022, 11:02 AM Pierre Slamich @.***> wrote:

— Reply to this email directly, view it on GitHub https://github.com/openfoodfacts/openfoodfacts-server/issues/7735#issuecomment-1326630154, or unsubscribe https://github.com/notifications/unsubscribe-auth/A4J6ZUFKYBVYJ4DBSMDWZJ3WJ6GPXANCNFSM6AAAAAASGWF4RQ . You are receiving this because you authored the thread.Message ID: @.***>

alexgarel commented 1 year ago

@stephanegigandet maybe we should prefix that kind of code, and have an ID like plu-00000. By default we know digits are EAN. I think this ask for a small rework in actual code (in code validation function).

alexgarel commented 1 year ago

I think importing this kind of database is really in tune with our project goal.

I would put "USDA database" as the brand

stephanegigandet commented 1 year ago

@stephanegigandet maybe we should prefix that kind of code, and have an ID like plu-00000.

we could so something similar for the products without barcodes, and prefix them with off-

one question is what to do with the USDA FDC entries: do we import them separately? same for ciqual?

so we could have one "PLU" orange, one "USDA FDC" orange, one "CIQUAL" orange.

another option is to have only one PLU orange, and then complete its nutrient data from either USDA FDC or CIQUAL.

alexgarel commented 1 year ago

so we could have one "PLU" orange, one "USDA FDC" orange, one "CIQUAL" orange.

Personally I prefer this approach. A data source is like a brand / producer.

Otherwise it might be quite funky to run updates.

Then if we want, in the future, to have a special "Orange" that mix them up, we could have it also independently.

FoodCoach-App commented 1 year ago

I don't understand the desire to separate the nutrition information by data source.

If a user searches for a specific type of apple (4015 for example), they don't care if the information from USDA or CIQUAL.

The USDA and CIQUAL data should be nearly identical anyway.

We can list the source, but when I search for 4015, I'd expect it to return information just as it does today.

https://world.openfoodfacts.org/product/4015/red-delicious-apple

(I didn't put this item in the database and was surprised that all PLU codes weren't already included. )

PLU codes only have 4 or 5 digits, there won't be any confusion with the longer EAN barcodes and unlikely to be a need to include a PLU pre-fix.

FoodCoach-App commented 1 year ago

A closer reading of some of the prior comments leads me to believe there is some confusion regarding the PLU codes.

A PLU code is for a specific name and type of commodity. A "Granny Smith" apple is a different PLU code from a "Red Delicious" apple, which is different from a "Fugi" apple. Hope this helps.

alexgarel commented 1 year ago

@FoodCoach-App, the separation I propose is more from a maintenance point of view, but also as sources may diverge. Saussage may not mean the same thing everywhere. I see that the international meaning of food is not so uniform, and may lead to surprise. Personally I think that making separate items is cleaner and easier because we can really take the exact name of each classification. That said we can have a special label, like "reference food", so that in a category you can quickly have all "reference food" (eg for "red apples" category).

Also the plu prefix, is more to think in the long term, when we may want to mix different type of code and the risk of clash becomes more elevated. For example CIQUAL code are also 5 digits and may clash with PLU codes.

FoodCoach-App commented 1 year ago

@alexgarel Can a consumer in France or the EU purchase an item and the item states that it is CIQUAL code "12345"? The CIQUAL codes seem to be the database of nutrition information that is made easy to lookup with numeric identifiers.

Edit: Posted too soon.

The PLU code is readily visible to consumers purchasing produce in the US, and, if a bi-layer barcode is present on the PLU sticker, it is imbedded in it. The goal of this request is to link the information that the consumer sees to nutrition information, from any source.

The CIQUAL database looks like it is easy to search. Unless consumers see the CIQUAL number on the packaging, why would we need to duplicate it in OFF?

hangy commented 1 year ago

Specifically looking to upload all PLU codes from here (https://www.ifpsglobal.com/PLU-Codes) with the free to use USDA nutritional information from here (https://fdc.nal.usda.gov/fdc-app.html#/food-details/2346398/nutrients).

What's the license of the data? Can this "free to use" data be combined with DbCL content?

FoodCoach-App commented 1 year ago

The USDA nutritional data-base is free to use. The IFPS data is user-submitted and the policy is here. https://www.ifpsglobal.com/Terms I'm not a lawyer, but I see no prohibition against using the data.

All of the data is easily downloadable as a .csv here: https://www.ifpsglobal.com/PLU-Codes/PLU-codes-Search

I have reached out to IFPS and asked this specific question.

teolemon commented 1 year ago
teolemon commented 1 year ago
alexgarel commented 1 year ago

@FoodCoach-App I was not aware that the PLU was written on products in the USA !

FoodCoach-App commented 1 year ago

@alexgarel not a problem. This was my first introduction to CIQUAL data as well. Does this change your suggestion(s) regarding how to store nutritional data for PLU codes? A PLU is almost directly analogous to UPC (at least in the USA).

If it would help, here is the current status (incomplete) of the proposed data-base to add to OFF.

In the spreadsheet, columns A-E are from IFPS and F-M are from the USDA but could just as easily be from CIQUAL. The nutrition data is all listed as "g/100g" to make it easy to find the total nutritional value based on the weight/mass of the produce. I have not differentiated nutritional information between the varieties (fugi, granny smith, red delicious) for each commodity (apple); for now, all "apples" have the same nutritional information".

PLU Codes.xlsx

alexgarel commented 1 year ago

Does this change your suggestion(s) regarding how to store nutritional data for PLU codes?

Not really, I'm still more in favor of a separation by source, and on a prefix by type of code. But we can consider making an exception for PLU on the prefix.

FoodCoach-App commented 1 year ago

We have permission from IFPS to use the data.

My request: What are the restrictions for reuse of the IFPS database? I'd like to marry the PLU's on your IFPS database with the USDA nutrition information and upload to OpenFoodFacts.com. Would this be permitted under your terms of use?

The Response: Hi Victor, It is fine to use the PLU codes as you mention in your email. If you could reference the IFPS Global website (https://www.ifpsglobal.com/PLU-Codes/PLU-codes-Search) for more information, that would be great. Best regards, Wendy

CharlesNepote commented 1 year ago

@alexgarel https://en.wikipedia.org/wiki/Price_look-up_code These codes also seem to be used not only in the USA, at least France: https://fr.wikipedia.org/wiki/Code_price_look-up

I'm also in favor to use prefixes.

FoodCoach-App commented 1 year ago

Thanks @CharlesNepote!

I did a quick check, the examples on the wiki page match the PLU data-base that I have.

PLU 3024 = Poire Rocha (En: Rocha Pears) PLU 4173 = Petite pomme royal gala (En: Royal Gala Apples: Small) PLU 4174 = Grosse pomme royal gala (En: Royal Gala Apples: Large) PLU 4664 = Tomate rouge (En: Red Tomatoes on vine)

(edited to remove erroneous links to #4664 Slack conversations)

Question regarding the use of a pre-fix. How would a new user know to use a prefix, or what prefix to use? The PLU number is similar to a bar-code and is often the only information on the produce. I'd image that they would look it up just like I did, by searching for "3024" or "4015" as if it was the UPC/EAN number.

The issue as I understand it is that the PLU and CIQUAL numbers overlap. If the CIQUAL numbers don't appear on produce, why would a user search for the CIQUAL number on the OFF database? If they already had the CIQUAL number, wouldn't they search on the CIQUAL database?

Put another way, what is the use-case for someone searching for nutrition information from CIQUAL numbers on the OFF database?

Or am I missing the pre-fix concern entirely?

stephanegigandet commented 1 year ago

It's not specific to PLU and CIQUAL, small numbers are likely to conflict with something now or in the future. There's no cost in adding a prefix, and we can make the search work so that it returns a PLU 4664 when someone types in 4664. The most common use case is probably going to be to search fruits and vegetables by name, not by PLU, so what matters the most is that we do load the PLU items as products. Same for ciqual that has more than fruits and vegetables: fish, meat etc.

FoodCoach-App commented 1 year ago

Got it. Thank you. On my end, I can make the FoodCoach API search the OFF database for 4053 (from the picture below).

How do we proceed?

Second Question, Still Related to PLU's: How does the OFF app handle reading GS1 Databar Stacked Omnidirectional barcode symbol stickers like this? The 4 digit PLU is imbedded in the 13 digit GTIN number string. I'd recommend cropping off the extraneous digits and simply returning the data for the 4 or 5 digit PLU code (the numbers in light blue on the second image).

image

image

stephanegigandet commented 1 year ago

Second Question, Still Related to PLU's: How does the OFF app handle reading GS1 Databar Stacked Omnidirectional barcode symbol stickers like this?

Well, is there a way to know that a GTIN-13 number follows that scheme? I think we should keep the original GTIN, but if we know that there is a PLU inside, we could link to it and/or use PLU data to complement it.

FoodCoach-App commented 1 year ago

Yes, GTIN13 numbers follow that scheme.

GTIN12 numbers (used in the US) use this scheme.

https://www.gs1us.org/documents?Command=Core_Download&EntryId=554

The full GTIN contains the grower/manufacture which is not relevant for nutritional information.

FoodCoach-App commented 1 year ago

What bar-code reading software does OFF use for stacked, GS-1 barcodes?

FoodCoach-App commented 1 year ago

I'd like to move forward with this. Here is what I think we've agreed to thus-far.

  1. all PLU numbers will contain plu-xxxx "plu-" followed by the PLU number
  2. the quantity will be listed for the average mass (in grams) of the PLU
  3. the PLU numbers will be from the IFPS database (already approved)
  4. Nutrition data will be from the USDA database (no problem with usage)
  5. Nutrition information will be in grams / grams and total grams based on the serving size

Did I miss anything?

If we are in agreement, how do I register as a corporation to use the bulk upload feature?

alexgarel commented 1 year ago

@stephanegigandet I propose to add a USDA-PLU org and add @FoodCoach-App as an org admin, is that ok ? Or maybe @FoodCoach-App you want to make a specific account for this ?

stephanegigandet commented 1 year ago

Well the issue is that we don't support codes like plu-4325 yet. We could prepare the file, test it on the producers platform etc. already though.

FoodCoach-App commented 1 year ago

I'll continue to prepare the file, consistent with what I think we've agreed on and closely matching the .csv file format I've uploaded to github already in this thread.

What is required to add the support of codes like "plu-4325" and who is able and interested in doing that work?

alexgarel commented 1 year ago

@FoodCoach-App, I have opened https://github.com/openfoodfacts/openfoodfacts-server/issues/7806

FoodCoach-App commented 1 year ago

Thanks @alexgarel.

I work on continuing to populate the PLU to nutrition data-base. Anything else?

One thing to add to the list of actions on #7806 is allowing a user to search for 4015 and it return "plu-4015". Otherwise users not using apps wouldn't know how to find the nutrition data.

FoodCoach-App commented 1 year ago

Happy Holidays everyone!

After the holidays, I'm thinking there are two ways that a user might want to find PLU information.

  1. entering only the 4 digit code
  2. scanning the full bi-layer GS1 barcode. (I haven't been able to implement a bi-layer GS1 barcode reader in my app yet, but this would be the best embodiment.)

For 1, there would be no change. OFF would simply insert a "plu-" prefix and it would be the same For 2, OFF would need to find the 10-13th digits in the string, pull them out of the string and then insert the "plu-" prefix

alexgarel commented 1 year ago

@FoodCoach-App this would be kind of a fall-back mechanism ?

FoodCoach-App commented 1 year ago

Long term, the users should be able to scan the bi-level GS1 barcode just like any regular barcode.

The bi-level GS1 barcode scanners required to read PLU stickers are difficult to find and are not in most barcode scanning apps.

I would image that the "best possible answer" would be that users would scan the bi-level GS1 barcode and OFF would return the nutrition information based on the 10-13th digits.

Here is an example of the PLU barcode. I've had a hard time finding barcode readers that can read it. Aspose.BarCode can, but I haven't been able to incorporate it into FoodCoach.

What barcode reader is in the OFF apps? Can it read this? image

FoodCoach-App commented 1 year ago

I'll download the app and give it a shot. The food won't be in the database, so I'm not sure yet what error code I'm going to get.

FoodCoach-App commented 1 year ago

I've downloaded the app. The app does not find/scan/register the bi-level GS-1 barcodes as shown above. Several others do. The best case scenario would be to allow users to scan the bi-level barcode just as they do with all other barcodes. The first series of digits are country of origin and the grower; both are not need to be stored in OFF as the nutritional content will be the same.

FoodCoach-App commented 1 year ago

That being said, I do like the app. Its very well put together, flows well and is intuative.

alexgarel commented 1 year ago

@FoodCoach-App I think you should open an issue on smooth-app repository for bi-level GS-1 barcode scanning. In this repository we deal with server side aspects.

FoodCoach-App commented 1 year ago

Will do. I'll use the same language and image as above.

chriswhiteoco commented 1 year ago

Sort of hairy topic. We have 'foods' and products. In the USDA database, a 'survey' food would be something like 'apple, raw' and the product would be that food from a certain grower in a certain country. Some products don't have a 'food' basis but are composed of ingredients. The nutrition of all the products for a certain food could be shared.

My two cents; either prefix everything or key the food database on code and codetype.

FoodCoach-App commented 1 year ago

@chriswhiteoco I'm not sure if I understand your question. This request would be to add nutritional data for different types of foods / commodities as defined by PLU codes (the little sticker on fruits and vegetables). While the specific grower and the country code is included on the full PLU bi-level barcode, the only relevant data for nutrition content is the 4 digit PLU code.

The general agreement here was to add a "plu" prefix to the PLU codes. Long term, users should be able use a bi-level barcode scanner to read the full 13 digit code and then the OFF app or others can omit the first 9 digits, add the PLU designation and return the PLU nutrition information.

chriswhiteoco commented 1 year ago

@FoodCoach-App I was thinking that because the code field is the key to the food table that it is important to avoid identifier collisions. Prefixing all codes with the code type would help with this. EAN-1234567890123, PLU-4053, etc.

Also, in lexicon applications, there is a root 'concept' then many specializations. For example, 'whole milk' would be a root concept for a food with an OFF identifier (or some other identifier like the PLU code) then there would be lots of 'product' entries that have 'whole milk' as their parent concept. This is what I meant when I was talking about how the USDA data has foods and products.

Many 'products' would also be root concepts as they are a unique packaging of ingredients.

Of course, the OFF data doesn't currently support this concept. It is pretty much a 'products' database that focuses on foods.

FoodCoach-App commented 1 year ago

@chriswhiteoco Thanks, I understand now.

I understand that OFF doesn't currently support this concept, but I think it aligns well with the goals of OFF, to easily determine nutrition information by scanning a barcode and collecting nutrition information.

In my embodiment/usage of OFF, users are upset/confused why they can't scan the bi-level barcode on fresh produce to "get credit" for healthy choices.

At this point, its looking like I'm going no need to store the PLU to nutrition information outside of OFF.

alexgarel commented 1 year ago

@FoodCoach-App yes, supporting PLU aligns with goals on OFF, and we are willing to go in the direction of supporting prefix to be able to import USDA as you mentioned. But at this moment we are lacking developer time for this, sadly.

john-gom commented 8 months ago

This was discussed in the OFF Days event today. Apologies for duplicating what has already been said, but the agreed action plan was as follows:

  1. Add PLU codes and other attributes to the categories taxonomy, adding missing items where necessary
  2. Potentially do the same with CIQUAL codes (fill in the gaps) and maybe other data sources too
  3. Introduce a "generic product" attribute on the product data model and have historic search APIs default to not return these (as the code will not be a real barcode)
  4. Create 2 products for each PLU code, one for regular and one for organic (which ahs the "9" prefix). Use a "plu-" prefix for these
  5. Also do the same for CIQUAL and other sources
  6. The PLU CSV links to images which we could maybe use, or we could use generative AI to produce these
  7. Enhance the mobile app to recognise PLU barcodes and search with then using the defined prefix
  8. Potentially enhance text search ranking so that generic products appear first

@stephanegigandet , @odtvince , @daims971

FoodCoach-App commented 8 months ago

I love these updates!

On Sun, Oct 22, 2023 at 9:58 AM john-gom @.***> wrote:

This was discussed in the OFF Days event today. Apologies for duplicating what has already been said, but the agreed action plan was as follows:

  1. Add PLU codes and other attributes to the categories taxonomy, adding missing items where necessary
  2. Potentially do the same with CIQUAL codes (fill in the gaps) and maybe other data sources too
  3. Introduce a "generic product" attribute on the product data model and have historic search APIs default to not return these (as the code will not be a real barcode)
  4. Create 2 products for each PLU code, one for regular and one for organic (which ahs the "9" prefix). Use a "plu-" prefix for these
  5. Also do the same for CIQUAL and other sources
  6. The PLU CSV links to images which we could maybe use, or we could use generative AI to produce these
  7. Enhance the mobile app to recognise PLU barcodes and search with then using the defined prefix
  8. Potentially enhance text search ranking so that generic products appear first

@stephanegigandet https://github.com/stephanegigandet , @odtvince https://github.com/odtvince , @Daims971 https://github.com/Daims971

— Reply to this email directly, view it on GitHub https://github.com/openfoodfacts/openfoodfacts-server/issues/7735#issuecomment-1774102937, or unsubscribe https://github.com/notifications/unsubscribe-auth/A4J6ZUAOT3CS5CYEE5ZALSLYAURA5AVCNFSM6AAAAAASGWF4RSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZUGEYDEOJTG4 . You are receiving this because you were mentioned.Message ID: @.***>

john-gom commented 8 months ago

Interesting link: https://github.com/topics/food-classification