Move the v3 API specifications to new, dedicated `openfoodfacts-schema` repository

chris-hatton commented 1 year ago

Proposal

Now that Open Food Facts API's are being defined by formal schemas (v3 as OpenAPI YAML) instead of only human interpretable documentation, it would be preferable to separate the schemas from openfoodfacts-server repository into their own dedicated openfoodfacts-schema repository.

Reasoning

A key strengths of using an OpenAPI schema is the ability to use code-generation tools to generate client and/or server code. Where Open Food Facts has several client library/App projects, it would be most useful if they could all reference the v3 API YAML schema without having to make independent copies e.g. by including schemas as a Git submodule.

If the schema remains held in openfoodfacts-server then we would need to 'submodule' this project into others, which is large and unwieldy.

It would be cleaner if openfoodfacts-schema is held in an agnostic space where Git commit history and PR's are concerned only with schema changes, and any GitHub Actions may be concerned only with continuously validating the schema/performing minimal client code-gen just to check that this process succeeds.

Actions

[ ] Create new openfoodfacts-schema repository ❗ This is an organisation restricted operation
[ ] Copy v3 API schemas into schema/v3 subfolder in new repository
[ ] Copy v3 API docs into docs/v3 subfolder in new repository ❓
[ ] Refactor documentation links to point to the new folder structure
[ ] Raise a PR against openfoodfacts-server which:
- Removes the original copies of v3 API schemas & docs
- Re-includes the v3 API schemas & docs as a Git submodule at repo path include/schema

Maintenance

By using submodules; each client/server project still gets to reference a specific Git-revision of the schemas (by hash). They can choose to adopt new API schema changes by moving the Git submodule reference.
Any CI pipelines for projects that start to use submodules must ensure they are checking out recursively (--recurse-submodules) to acquire a complete copy. This is standard for many CI platforms.

Use cases

The new openfoodfacts-kotlin project intends to use the v3 API schemas for client code-generation.

alexgarel commented 1 year ago

@chris-hatton we have to discuss this. (and first of all thanks for the issue)

We on purpose move the documentation in this repository so that a code change (affecting an API) can be committed with the API change (and that at PR review time, we can ask for it).

The older documentation did drift from code also because it was living in another repository (we already have a big work to make the current doc up to date).

Is there a way that we could create an openfoodfacts-schema repository that "mirror" the folder in this repository (seen as the source) ? I imagine it should be possible with a bit of CI. What do you think ?

alexgarel commented 1 year ago

@chris-hatton we discuss it a bit during our Product-Opener weekly meeting today (see README.md if you want to join).

Some points:

we would really want the documentation to stay close to the code

Possible solutions to your concern:

mirroring to a repo (as proposed above), but it seems a bit over-complicated.
have maybe a downloadable json file of whole spec (see if documentation tool already provides it, or not) (so you don't need to use git submodules, but may just download the spec ?) Or maybe the spec could be on the open food facts server itself, with a version number.
maybe we can track API changes in releases (integrating it to release-please process)

VaiTon commented 1 year ago

have maybe a downloadable json file of whole spec

maybe we can track API changes in releases (integrating it to release-please process)

+1

chris-hatton commented 1 year ago

Thanks for these thoughts and context @alexgarel 👍 It helps a lot to understand the motivations for the current structure.

Firstly; I think the way I am proposing to re-home the API Documentation and Schema together is only really important for the v3 API and not the v2 or other minor versions. It could be decided to move them together for categorical consistency, but the biggest practical benefit of keeping v3 separate is, as stated, the ability to include the schema as a repository-level submodule inside other OpenFoodFacts projects, for the purpose of code generation.

Here's my take on the issues you raised, surrounding this proposal:

We on purpose move the documentation in this repository so that a code change (affecting an API) can be committed with the API change (and that at PR review time, we can ask for it).

If we were to use Git submodule then a single 'atomic' PR containing API schema changes and corresponding server code changes would not be possible, because one PR concerns a change in one repo. only, and the schema/server-code would then physically reside in two separate repositories.

I can see two potential ways to address this:

Reconsider the workflow: It would be helpful to better understand how OpenFoodFacts organises API changes, but an alternative workflow would be to embrace the idea that v3 API schema changes are made as a separate step before corresponding implementations are merged into client/server projects.

This removes some of the agility that server Developers will have enjoyed, but as OpenFoodFacts scales, this might prove to be a healthier way to operate, since the various API consumers can participate in PR's for schema changes and be pre-warned of impacts on their work - rather than a situation where server developers effectively dictate that 'all clients must consume X data format because it's already implemented that way' and so there is pressure to accept. To be clear, I make no suggestion this is what is happening in OFF, but it could become a risk at any time.

With this workflow, consuming projects who include the schema repo as a submodule would not be forced to adopt new schema changes right away, since their own code revisions are always pointing to a specific commit hash of the schema repo. Following this, updating to the latest schema would be a simple matter of 'Git pulling' on the submodule which will check out the latest revision of the schema in its registered repo sub-directory.

Be aware, server developers would still be able to make schema changes in-line with their development work: this is an important practical point, but the expectation would be that they point their development branches to a corresponding development branch of the schema repo; and once the work in both is complete then, again the schema PR comes before the server PR.

Using submodules does implies a little learning from maintainers around how to manage the submodule reference, but it is not too complicated. The major advantage of submodules in this scenario is having one source of truth - repositories do not maintain their own independent copies of the schema and the inherent risk of getting out-of-sync.

Another technical option is to use Git subtrees. With this approach, each repository that uses schemas is still a 'complete' repository that retains its own physical copy of the schemas. The only difference is a small piece of metadata telling Git that the schema sub-directory has its own 'remote' - the schema repository - from which committed changes can be easily pulled/pushed.

The advantage of this is that both client and server developers can submit code-reviews for schema and source changes atomically. The downside is that the act of pushing/pulling changes to synchronise them with the central 'schema' repository is still subject to good, frequent communication, with a risk of things 'getting out of sync' if any one actor forgets to push a change for some time.

Is there a way that we could create an openfoodfacts-schema repository that "mirror" the folder in this repository (seen as the source) ? I imagine it should be possible with a bit of CI.

I think option (1) achieves this in spirit; except it's technically the other way around - the openfoodfacts-schema repository would appear as the folder in the server repository.

Considering your description of previous 'drift', I would probably advocate for option (1). The cost is requiring some education about the new process for OFF Developers. Some good quality documentation can help, which I'd be happy to contribute to. The benefit is it coerces project repos to remain in sync and enables a diplomatic schema review process. This is in addition to the benefits of the schema repo itself - fast CI that runs only schema-focused automations like validation and throwaway client/server code-gen for an additional layer of validation.

I appreciate my descriptions of submodules, however laboured, may still be less illuminating than an actual example. If there's enough interest, then we could tentatively create an openfoodfacts-schema repository, private at first to avoid any confusion, then make two 'demo' branches of openfoodfacts-server that references it by submodule/subtree, for any interested folks to review these two potential solutions in action and thereby better understand the implications.

TL;DR; Using submodules seems compatible with addressing the concerns raised. PR process would have to change but this may be positive?

Keen to hear your thoughts.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 90 days with no activity.

openfoodfacts / openfoodfacts-server