Open CharlesNepote opened 1 year ago
I'm afraid there is a bug in the current code because the number of duplicates is increasing: 42 as of 2022-12-02 vs 33 as of 2022-11-16.
relates to https://github.com/openfoodfacts/openfoodfacts-server/issues/7248 that could monitor and fix this kind of things.
This issue is stale because it has been open 90 days with no activity.
bumping to p1, as It got a conversation off topic :-/
Describe the bug
The database contains few products' duplicates (~30). It can be seen in several places.
It seems to be due to the _id stored as a number at first, and then as string.
To Reproduce
In the JSONL export:
In the CSV export:
In Mirabelle (based on CSV export): http://mirabelle.openfoodfacts.org/products?sql=--+identify+duplicates%0D%0Aselect+rowid%2C+code%2C+url%2C+count%28*%29+as+%22count%22+from+%5Ball%5D+group+by+code+having+count%28*%29+%3E+1%3B This query lists all the products being duplicates as of 2022-11-16 (33).
Expected behavior
No duplicates.