openfoodfacts / openfoodfacts-server

Open Food Facts database, API server and web interface - πŸͺπŸ¦‹ Perl, CSS and JS coders welcome 😊 For helping in Python, see Robotoff or taxonomy-editor
http://openfoodfacts.github.io/openfoodfacts-server/
GNU Affero General Public License v3.0
652 stars 381 forks source link

Product code stored as number in MongoDB #1077

Closed stephanegigandet closed 5 years ago

stephanegigandet commented 6 years ago

At least one product has its code stored as a number instead of a string in mongodb:

db.products.find({_id: "0048151623426"}) { "_id" : "0048151623426", "generic_name_en" : "DO NOT DELETE", "rev" : 35, "nutrient_levels_tags" : [ ], "ingredients_ids_debug" : [ "api-tests-should-be-done-on-the-net-domain", "not-here" ], "categories_prev_tags" : [ ], "creator" : "tacite", "generic_name" : "DO NOT DELETE", "categories_hierarchy" : [ ], "traces" : "", "ingredients_text_with_allergens_en" : "API Tests should be done on the .net domain, not here.", "emb_codes_tags" : [ ], "additives_prev_original_tags" : [ ], "nutrition_data_per_debug_tags" : [ ], "origins" : "", "code" : NumberLong("48151623426"),

The .sto file has the code stored as a string: 35.sto 0048151623426^@^D^@^@^@code^

There is some code in Products.pm to prevent that from happening, I do not know what happened here.

Need to check if other products are affected.

stephanegigandet commented 6 years ago

Lots of products have code stored as a number in the db:

db.products.count ( {"code" : { $type: "long"} }) 192962 db.products.count ( {"code" : { $type: "string"} }) 229387

stephanegigandet commented 6 years ago

culprit is the update_all_products.pl script

fixing it and re-running it.

stephanegigandet commented 6 years ago

Still affecting some products.

db.products.count ( {"code" : { $type: "long"} }) 71486

teolemon commented 6 years ago

image

stephanegigandet commented 6 years ago

I thought re-running the update_all_products.pl script after fixing it would fix the issue, but it only fixed it for products without leading 0 in the code. I'm running a new fix.

db.products.count ( {"code" : { $type: "long"} }) 71418 db.products.count ( {"code" : { $type: "long"} }) 67683

stephanegigandet commented 6 years ago

There might be issues for products with leading 0s that have been edited in the last few days, not sure.

stephanegigandet commented 6 years ago

The script finished, all codes are now strings:

db.products.count ( {"code" : { $type: "long"} }) 0

stephanegigandet commented 6 years ago

US front page looks good:

image

hangy commented 6 years ago

I think we should consider moving to using an ObjectId as the primary key of the products collection. Right now, we store the code in _id and in the _code field. Apparently, we also have products where both fields do not match

> db.products.find({ $where : "this._id !== this.code" }).count()
3349

Selected examples

{ "_id" : "2000000001702", "code" : "203474683" }
{ "_id" : ObjectId("57d139e83296350bfc5a9271") }
{ "_id" : "00574082", "code" : 574082 }

That would also be a better base for #637 and unique product ids (that could be useful for #677)

OpenChris commented 6 years ago

Still seeing the "code" stored as Int32 issue, for example, product 00735346.

stephanegigandet commented 6 years ago

db.products.count ( {"code" : { $type: "long"} }) 0

stephanegigandet commented 6 years ago

db.products.count ( {"code" : { $type: "int"} }) 3137

hangy commented 5 years ago

db.products.count ( {"code" : { $type: "long"} }) 0 db.products.count ( {"code" : { $type: "int"} }) 0

sethidden commented 3 years ago

In case someone is also confused in the future: the main format for _id is actually just "string", not ObjectId. So don't try to serialize _id as ObjectId if you're making your own api with OFF data etc.

> db.products.count ( {"_id" : { $type: "objectId"} })
0
> db.products.count ( {"_id" : { $type: "long"} })
13
> db.products.count ( {"_id" : { $type: "string"} })
1965081