Think of New Model Strategy

glass-ships commented 7 months ago

We should come up with a new strategy for maintaining and using a Monarch LinkML Schema.

Options include (in no particular order of preference/usefulness):

Creating a monarch-data-model repo and importing/downloading within ingest and app
Merging ingest into app
Splitting model out between ingest and app (again) and importing the core classes in app
...Probably others?

kevinschaper commented 7 months ago

To add some extra statement of the problem:

Our current strategy involves using the biolink model for ingest transforms, and then we store the model for Solr + API responses in monarch-app.

We haven't had significant issues yet involving the transition from individual tsv output that conforms to the biolink model and ingest into Solr using the monarch-app model, but obviously there's a duplication of slot definitions, etc.

What has been a bit of a nightmare is that monarch-ingest depends on monarch-app for the schema, but the schema changes are needed first in monarch-ingest for creating the solr index, but before that solr index exists, they would be breaking changes for monarch-app. This means that I always need to create a "broken" draft branch in monarch-app, commit a change to monarch-ingest that pulls that schema so that I can do a build, then once that build is available and can be pushed to our dev environment, we need to time the monarch-app PR with the deploy of new data to monarch-dev. It always feels like a fraught and awkward process, and I think it's because we have the dependency somewhat backwards.

matentzn commented 7 months ago

What are the main arguments against a separate repo?

kevinschaper commented 7 months ago

Having to keep parallel branches in both a model and api repo for new features is definitely more overhead than those changes being part of the same PR in a single repo.

Having the schema for api endpoints near the api methods is definitely easier. In some sense, maybe a good way describe the challenge I've had is that the schema for the output of the ingest isn't "near" the ingest.

matentzn commented 7 months ago

A proper software architecture challenge 😂 sounds fun!

kevinschaper commented 3 months ago

My inclination here is to split Entity and Association from the monarch-app schema and move them to monarch-ingest, so that we can perform validation of the KG tsv against our schema.

kevinschaper commented 3 months ago

As a matter of practicality, I don't think we'll be able to start on this during this release, so I'll move it to the next one.

monarch-initiative / monarch-app

Think of New Model Strategy #471