microbiomedata / nmdc-schema

National Microbiome Data Collaborative (NMDC) unified data model
https://microbiomedata.github.io/nmdc-schema/
Creative Commons Zero v1.0 Universal
27 stars 8 forks source link

Document the purpose and usage of the `Dockerfile` #1167

Closed eecavanna closed 11 months ago

eecavanna commented 1 year ago

There is a Dockerfile in the root folder of the repo. It contains:

FROM python:3.9

ADD . /src/

RUN \
    pip install poetry && \
    cd /src && poetry install 

Other than in the name of that file, I don't see any occurrences to the string "docker" in the repo.

I recommend documenting the file somewhere in the repo.

eecavanna commented 1 year ago

Based on the contents of the Dockerfile, I think it can be used to create an environment that contains the dependencies of nmdc-schema. With that in mind, I'll try using the file and report back.

eecavanna commented 1 year ago

Here's some documentation I came up with after experimenting with the file:


Development

This repository contains a Dockerfile you can use to run a container in which all the dependencies of nmdc-schema are present.

Usage

You can build the container by issuing the following command in the root folder of the repo:

# Build a Docker image based upon the Dockerfile in the current folder.
docker build -t nmdc-schema .

Once the container has been built, you can run it with:

# Instantiate the Docker image as a container, mount the current folder within it,
# set the working directory inside the container to `/src`, attach your terminal
# to the container's STDIN, STDOUT, and STDERR streams, run `bash` within the
# container, and delete the container as soon as `bash` stops running.
docker run --name nmdc-schema --rm -it -v "$(pwd):/src" -w /src nmdc-schema /bin/bash

Alternatively, once the container has been built, you can run a specific shell command in it:

docker run --name nmdc-schema --rm -it -v "$(pwd):/src" -w /src nmdc-schema /bin/bash -c "hostname; whoami"
eecavanna commented 1 year ago

On second thought, I don't think the container contains all the dependencies of nmdc-schema. For example, it doesn't have yq installed.

image

Based on that new information, I think the Dockerfile is a remnant of an unfinished attempt to create a Docker-based development environment for nmdc-schema.

I want there to be a Docker-based development environment for nmdc-schema. I'll try adding to the Dockerfile with that in mind.

eecavanna commented 1 year ago

Here's a Dockerfile I wrote that installs pandoc, yq, and Apache Jena (in addition to Poetry).

FROM python:3.9

WORKDIR /src

# Download and install pandoc.
RUN apt update && \
    apt install -y \
      pandoc

# Download and install yq.
# Reference: https://github.com/mikefarah/yq#install
RUN wget https://github.com/mikefarah/yq/releases/latest/download/yq_linux_amd64 -O /usr/bin/yq && \
    chmod +x /usr/bin/yq

# Download and install Apache Jena.
# Reference: https://sparrowflights.blogspot.com/2012/12/how-to-install-jena-command-line-tools.html
RUN wget -P /downloads/tmp "https://dlcdn.apache.org/jena/binaries/apache-jena-4.9.0.zip"
RUN unzip /downloads/tmp/apache-jena-4.9.0.zip -d /downloads/apache-jena
ENV JENAROOT="/downloads/apache-jena/apache-jena-4.9.0"
ENV PATH="$JENAROOT/bin:$PATH"

RUN pip install poetry

ADD . /src/
RUN poetry install 

CMD ["echo", "Hello and goodbye."]

After building the Docker image from that Dockerfile, I instantiate a container from it by issuing this command:

docker run --name nmdc-schema --rm -it -v "$(pwd):/src" nmdc-schema /bin/bash
turbomam commented 1 year ago

Do you imagine that people would use the container interactively, by gaining access to it's shell? Or add more commands to the Dockerfile, like make squeaky-clean all test validate-filtered-request-all make-rdf ?

turbomam commented 1 year ago

Could we use a wrapper like this run.sh from the Environment Ontology?

turbomam commented 1 year ago

And let's make sure we have an idea of the motivating use cases for running nmdc-schema code

eecavanna commented 1 year ago

I imagine people using it either (a) interactively (via $ docker run -it ... bash); or (b) via docker run ... some-command, where they specify the single command they want to run within it (e.g. $ docker run ... make make-rdf).

eecavanna commented 1 year ago

I plan to use Docker Compose to "wrap" the volume-mounting commands and anything else that would otherwise be manually specified via $ docker run ...; as opposed to using a custom wrapper/shell script.

The end user could then run $ docker compose up to spin up the Docker container; and, for example, $ docker compose exec ... make make-rdf to run one-off commands within it.

eecavanna commented 1 year ago

My motivation for creating the Dockerfile was to be able to run the various make commands specified in the repo (related to running the migration code), without installing the repo's dependencies directly on my local machine (some of them are things I am not familiar with; e.g. Apache Jena).

eecavanna commented 1 year ago

I'm also open to submitting the Dockerfile to the upstream LinkML cookie-cutter repo, but I'm not prepared to discuss that with its maintainers since I'm not familiar with the cookie-cutter repo except that it was used to create this repo.