openfoodfacts / folksonomy_api

A light REST API designed for Open Food Facts folksonomy engine
https://wiki.openfoodfacts.org/Folksonomy_Engine
GNU Affero General Public License v3.0
11 stars 7 forks source link

barcode limit: 13 digits or more? #49

Closed CharlesNepote closed 2 years ago

CharlesNepote commented 3 years ago

Folksonomy API currently recognize products with the following regexp: [0-9]{1,13} which means barcodes containing 1 to 13 digits.

More than 12,000 products have barcodes bigger than that. Eg. https://world.openfoodfacts.org/product/70238541220839/ikea (even if it's not a food product, it is a real barcode).

This topic is relevant because Folksonomy Engine could be use for other kind of products, nit just food. And these other kind of products sometimes have different barcodes than classical EAN13 barcodes.

To extract the barcodes with more than 13 digits: zcat openfoodfacts-products.jsonl.gz | jq -r '. | select(.code|test("^[0-9]{1,13}$") | not) | .code'

See also:

aadarsh-ram commented 2 years ago

Hi, I'd like to work on this issue. Any pointers on how to get started? @CharlesNepote @alexgarel please do let me know the different kinds of barcodes, that must be accepted during the validation process.

alexgarel commented 2 years ago

I think changing the regexp is simple: @aadarsh-ram see: https://github.com/openfoodfacts/folksonomy_api/blob/main/folksonomy/models.py#L8

But then there is the table definition : https://github.com/openfoodfacts/folksonomy_api/blob/main/db/db_setup.sql

So you will have to :

aadarsh-ram commented 2 years ago

Thanks @alexgarel for explaining this issue briefly! I'll start working on this issue. So, are there any contribution guidelines I need to follow, while working on this? If I assume there's no safe limit for the barcode, what kind of test would I need to add for checks? The _testmain.py in the repo seems to cover them all.

aadarsh-ram commented 2 years ago

Hi @alexgarel @CharlesNepote @teolemon, I have been working on this issue on a separate fork of this repo. During testing, I noticed a unit test for the product code empty case. But, there isn't any check being done in models.py. Should I go ahead and add a check for empty product code in models.py?

Many unit tests also seem revolve around version checking such as test1, test2 and test3. As far as I could see, these versioning checks have not been implemented for the put method and the post method of the /product path. These checks have been implemented in the delete method only. Is this an known issue, and does it have any solution? Should I comment those tests which involve version checking? (The API code currently fails only these version checking tests.)

alexgarel commented 2 years ago

I noticed a unit test for the product code empty case. But, there isn't any check being done in models.py. Should I go ahead and add a check for empty product code in models.py?

Yes @aadarsh-ram go ahead and fix this :-)

alexgarel commented 2 years ago

Many unit tests also seem revolve around version checking such as test1, test2 and test3. As far as I could see, these versioning checks have not been implemented for the put method and the post method of the /product path. These checks have been implemented in the delete method only. Is this an known issue, and does it have any solution? Should I comment those tests which involve version checking? (The API code currently fails only these version checking tests.)

Please @aadarsh-ram open a new issue and propose a PR if you can. This is important to fix this according to me.

aadarsh-ram commented 2 years ago

Please @aadarsh-ram open a new issue and propose a PR if you can. This is important to fix this according to me.

@alexgarel I have fixed this with the existing PR that I have created. Please go through it, and let me know if there are any changes :)

If required, I can open a new issue and link it to the existing PR. I wasn't sure if I was supposed to do that, when I was fixing the issue. Do let me know how to proceed.

alexgarel commented 2 years ago

@aadarsh-ram I discovered that the fix you proposed for version check was already done in a smart way internally by the database. See db/db_setup.sql

-- automatic timestamp + version check
CREATE OR REPLACE FUNCTION folksonomy_timestamp() RETURNS trigger AS $folksonomy_timestamp$

So you should remove the code you added for that.

aadarsh-ram commented 2 years ago

@aadarsh-ram I discovered that the fix you proposed for version check was already done in a smart way internally by the database. See db/db_setup.sql

-- automatic timestamp + version check
CREATE OR REPLACE FUNCTION folksonomy_timestamp() RETURNS trigger AS $folksonomy_timestamp$

So you should remove the code you added for that.

@alexgarel I'm really sorry, didn't notice that. These checks look awesome! It seems I had PostgreSQL 12.9 installed, rather than 13. Hence, a trigger function for the version checks didn't work, when I tested. After I installed PostgreSQL13, all tests passed smoothly. Shall I revert back the API code to however it was before?

Thanks for your feedback!

aadarsh-ram commented 2 years ago

Hi @alexgarel @CharlesNepote @cquest, The API code has been reverted back to previous code. I have added a db-settings file and imported the user and host in the db-migrations file. The .gitignore file already contains _localsettings.py.

I haven't added the db-settings.py in .gitignore, since it should be used while migrating data in the server where the current API is hosted. Please do let me know if there are any other changes that need to be made. Thanks for helping me out!