monarch-initiative / mondo-ingest

Coordinating the mondo-ingest with external sources
https://monarch-initiative.github.io/mondo-ingest/
6 stars 3 forks source link

Packaging of minimal ODK (`run.sh`) in non-ODK-templated repos #446

Open joeflack4 opened 7 months ago

joeflack4 commented 7 months ago

Overview

In some of our repos, e.g. ICD11Foundation and MedGen, we use the ODK container to run things, but I do not think these repos are so reliant on ODK that they justify utilizing the full ODK directory structure template.

Current approach and problems

For the aforementioned repos, I've committed a static copy of run.sh. However this is problematic because it conflates ODK code (run.sh) with the code for that repository. It also is difficult to tell what version of ODK that run.sh applies to, and it doesn't lend well to changing/upgrading the ODK version.

Possible solutions

I suggest wrapping run.sh in a package that can be installed via some package manager. For now, I'm suggesting Python and PyPI.

Related

joeflack4 commented 7 months ago

@matentzn @twhetzel @hrshdhgd @souzadevinicius if you guys have any thoughts.

matentzn commented 7 months ago

We have someone working on a binary runner for ODK as well. In the meantime, I suggest the following:

  1. As you say, for minimal projects we do not ever use the ODK repo structure. This is total overkill, I agree.
  2. For every product we build, we have
    1. A github action workflow pinned to a specific ODK docker image that runs the process - no manual intervention
    2. For utility purposes, we have a modified odk.sh and a Makefile in every repo root, and the understanding that if you run odk.sh make all all relevant products supported by this pipeline are build. The idea of odk.sh (dont conflate with run.sh or run-command.sh) is described here: https://oboacademy.github.io/obook/howto/odk-setup/#for-maclinux. For reproducibility, I suggest to hard code the ODK version we tested with in odk.sh and only upgrade when needed. The purpose of odk.sh is to ensure that all pipeline code developed can be executed inside the ODK container.

I am not diamterically opposed to a simple python utility called odk that wraps the runner using a docker package. It would be very neat indeed. I just think that developing this will be a larger overhead now (even though its fun) and we have a tough year ahead of us in terms of demands on Mondo for our grant renewal cycle. My two cents. However, if @twhetzel believes this is worthwhile:

pip install odk
odk robot --help

I would support you in this endeavour. (please dont start coding yet - I want to make sure you use the right monarch cookie cutters and style guides etc to set up the project).

joeflack4 commented 6 months ago

Just a note, that the run.sh used by various repositories, e.g. gard, icd11, and medgen, are older/different than the one in mondo-ingest. But I'm not updating them to what is in mondo-ingest because if I do that, all I get is make: *** No rule to make target 'TARGET'. Stop.

matentzn commented 6 months ago

This is because ODK based run.sh files bind the directory two levels up as a volume rather than the one you are in right now (this is because the Makefile is in src/ontology). In repos where the Makefile is at the top level, you dont need that.

But maybe for most repos, this is overkill, just use the more simple version that already exists. Thanks for checking!

joeflack4 commented 6 months ago

I figured that was the case. I tried tweaking a little but still couldn't get to work. Good to know the older version of the file is fine.