Open glass-ships opened 7 months ago
To add some extra statement of the problem:
Our current strategy involves using the biolink model for ingest transforms, and then we store the model for Solr + API responses in monarch-app.
We haven't had significant issues yet involving the transition from individual tsv output that conforms to the biolink model and ingest into Solr using the monarch-app model, but obviously there's a duplication of slot definitions, etc.
What has been a bit of a nightmare is that monarch-ingest depends on monarch-app for the schema, but the schema changes are needed first in monarch-ingest for creating the solr index, but before that solr index exists, they would be breaking changes for monarch-app. This means that I always need to create a "broken" draft branch in monarch-app, commit a change to monarch-ingest that pulls that schema so that I can do a build, then once that build is available and can be pushed to our dev environment, we need to time the monarch-app PR with the deploy of new data to monarch-dev. It always feels like a fraught and awkward process, and I think it's because we have the dependency somewhat backwards.
What are the main arguments against a separate repo?
Having to keep parallel branches in both a model and api repo for new features is definitely more overhead than those changes being part of the same PR in a single repo.
Having the schema for api endpoints near the api methods is definitely easier. In some sense, maybe a good way describe the challenge I've had is that the schema for the output of the ingest isn't "near" the ingest.
A proper software architecture challenge 😂 sounds fun!
My inclination here is to split Entity and Association from the monarch-app schema and move them to monarch-ingest, so that we can perform validation of the KG tsv against our schema.
As a matter of practicality, I don't think we'll be able to start on this during this release, so I'll move it to the next one.
We should come up with a new strategy for maintaining and using a Monarch LinkML Schema.
Options include (in no particular order of preference/usefulness):
monarch-data-model
repo and importing/downloading within ingest and app