Open eecavanna opened 1 month ago
Given these problems, I think it makes increasing sense to extract some minter functionality into a singular standalone service.
I see a couple of potential routes we can take:
De-couple (a) syntactic construction of IDs from (b) uniqueness-ensuring persistence of IDs. That is, continue to have each runtime instance generate IDs as a function of the schema installed for the instance, but require runtime instances to synchronously register the IDs they mint with a singular ID registry service in order to ensure global uniqueness and persistence. In this world, the registry service can assign MINTING_SERVICE_ID
values for each runtime instance across all environments.
Migrate all minting functionality to a singular central service (as described by #484).
Background
We use a minter to generate IDs for Mongo documents. The minter is part of the Runtime. We have multiple environments (e.g. production, development, Berkeley), each of which has its own Runtime and, therefore, minter. Each minter keeps track of the IDs it has generated (i.e. consumed) in its Mongo database.
Because the minter is coupled to the Runtime (the former is part of the latter), and team members sometimes (e.g. when writing database migration scripts) want to mint IDs for classes that aren't defined in the schema currently being used by the production Runtime, team members sometimes mint IDs in non-production environments and later insert the documents having those IDs into the production database.
Problem
There are two problems:
We routinely discard some of our non-production databases (e.g. development, Berkeley), replacing them with updated dumps of the production database. For example, we replace the development database as part of the standard monthly release process. As a result, the minters in those environments lose track of the IDs they have generated. This means it is possible that those minters generate an ID they have previously generated, a second time.
Each minter is configured with an environment variable named
MINTING_SERVICE_ID
, whose value is incorporated into the IDs generated by that minter. I assume the person that designed the minter intended for people to populate with a string that is not used in any other environment; but I don't think anything prevents someone from using the sameMINTING_SERVICE_ID
value in multiple environments (i.e. the values ofMINTING_SERVICE_ID
, themselves, are not "minted" by a single authority). If someone were to do so, it would be possible that two minters generate identical IDs as one another (since those minters would be using different Mongo databases to keep track of the IDs it has generated).Task
MINTING_SERVICE_ID
Related
https://github.com/microbiomedata/nmdc-runtime/issues/484 - A ticket about extracting the Runtime into a standalone service
CC: @dwinston , @PeopleMakeCulture