openfoodfacts / openfoodfacts-server

Open Food Facts database, API server and web interface - 🐪🦋 Perl, CSS and JS coders welcome 😊 For helping in Python, see Robotoff or taxonomy-editor
GNU Affero General Public License v3.0
615 stars 359 forks source link

Manage taxonomies properties in an external CMS #8536

Open raphael0202 opened 1 year ago

raphael0202 commented 1 year ago

This issue arose from recent discussions we had with @alexgarel and @teolemon and the rest of the permanent team about the way we manage taxonomy-related content on Open Food Facts.

Currently, everything is stored in the taxonomy files: descriptions, wikidata IDs, vegan/vegetarian status for ingredients,... This approach has served us well so far, but has some drawbacks:

We do need careful review when updating taxonomy names, as it has a major impact on ingredient parsing and it can be quite tricky. But we probably don't when it comes to fixing a typo, adding a basic description or updating a wikidata ID: user tracking is enough to spot potential abuse.

My proposal is to use an external CMS to manage taxonomy properties. I've tried strapi which is very flexible headless CMS. This can be used to update entries with custom schemas.

Schema management: Capture d’écran du 2023-06-08 16-11-33

Items list: Capture d’écran du 2023-06-08 16-12-03

Item update: Capture d’écran du 2023-06-08 16-12-23

The tool is open-source and allow easy import/export of data through an API, we could export it to JSON/yaml format. We could perform a daily export of the data so that Product Opener (and other projects) can access it. The advantages of this approach are the following:

After talking with @alexgarel, it turns out it would be better to manage synonyms and names through taxonomy editor only, as we want to control user input sanitization (checking synonyms/duplicates).

edit: To give a bit more context, we would like to add more "knowledge" information to Open Food Facts, such as the claims associated with food labels, health risks associated with additives, category descriptions,... It turned out quickly that we cannot reasonably store all this information in taxonomy files, hence this proposal.

raphael0202 commented 1 year ago

@teolemon @stephanegigandet @john-gom @alexgarel @aleene I would be interested to have your feedback on this :)

john-gom commented 1 year ago

It is something I have thought about a lot too since @teolemon mentioned the idea of using Wikibase / Wikidata. There are certainly advantages to a generic tool but we do need extensibility for things like constraint validation and concepts like synonyms (which Wikibase does handle). Another area where I think taxonomy management could be enhanced is being able to see the impact of changes, e.g. "if I update the nutrient content of this ingredient how many products will be affected?".

In my own experience these generic tools never go quite far enough and then when you try to customise them you get into a world of pain with proprietary techniques, no testability, etc. and so it would be easier to develop from scratch. I would personally go further than the current taxonomy editor in that I would want specific UIs for each type of taxonomy. We can easily cater for extensible attributes and I have done this many times before.

One thing I 100% agree on is that we should treat taxonomies as data and not source code. Unit tests should rely on specifically created sample taxonomies for the purpose of the specific test, with validation rules during taxonomy data capture to ensure that edits to the data don't break things.

raphael0202 commented 1 year ago

I agree with you on the customization part. The issue is that taxonomy editor is not ready for deployment, and we lack development resources on both front-end and backend. I doubt we could have a version of taxonomy editor (for property editing) that has the same features as an external CMS for content creation in 2023, and OFF knowledge base is a project I really want to launch this year. For purely text-based content without validation, a CMS is suited, but for synonyms or properties like nutrient content it would be better to keep it on taxonomy editor.

I've thought on relying on Wikidata for this content, but as far as I recall we cannot create custom properties anymore on wikidata.

stephanegigandet commented 1 year ago

A few additional things we should pay attention to when selecting or designing a solution:

john-gom commented 1 year ago

I'm not sure that having two tools for this (CMS + Taxonomy editor) will be very usable. I take the point about not having resources, and it is disappointing that existing tools don't quite do the job, but I suspect that if we try and use a CMS to do part of the work and then have other tools to supplement data, do validation and export in various formats, then we could end up doing a lot more work in the long term than just writing the tool from scratch.

Having said that, it might be possible that we could start with a CMS like Strapi and write plug-ins to handle specific things like Synonyms and other business rules?

Maybe we should start by simply capturing all of the requirements, including what @stephanegigandet mentioned above and then think about the various options?

VaiTon commented 1 year ago

I have recently found https://terminusdb.com/ and, despite not being as mature as something like Wikibase, it has an integrated review and versioning workflow.

john-gom commented 1 year ago

I have recently found https://terminusdb.com/ and, despite not being as mature as something like Wikibase, it has an integrated review and versioning workflow.

Did you find any information on how it supports multiple translations of entry names and descriptions? I couldn't see anything on a quick scan of the docs.

alexgarel commented 1 year ago

Another path might be using something like a no-code tool (like airtable). I think about nocodb or baserow I've also seen data-patch but far less mature.

The advantage here is that the backend can be postgresql. So we could still do a lot of checks / stats.

github-actions[bot] commented 8 months ago

This issue has been open 90 days with no activity. Can you give it a little love by linking it to a parent issue, adding relevant labels and projets, creating a mockup if applicable, adding code pointers from https://github.com/openfoodfacts/openfoodfacts-server/blob/main/.github/labeler.yml, giving it a priority, editing the original issue to have a more comprehensive description… Thank you very much for your contribution to 🍊 Open Food Facts