Closed eecavanna closed 11 months ago
Based on the contents of the Dockerfile
, I think it can be used to create an environment that contains the dependencies of nmdc-schema
. With that in mind, I'll try using the file and report back.
Here's some documentation I came up with after experimenting with the file:
This repository contains a Dockerfile
you can use to run a container in which all the dependencies of nmdc-schema
are present.
You can build the container by issuing the following command in the root folder of the repo:
# Build a Docker image based upon the Dockerfile in the current folder.
docker build -t nmdc-schema .
Once the container has been built, you can run it with:
# Instantiate the Docker image as a container, mount the current folder within it,
# set the working directory inside the container to `/src`, attach your terminal
# to the container's STDIN, STDOUT, and STDERR streams, run `bash` within the
# container, and delete the container as soon as `bash` stops running.
docker run --name nmdc-schema --rm -it -v "$(pwd):/src" -w /src nmdc-schema /bin/bash
Alternatively, once the container has been built, you can run a specific shell command in it:
docker run --name nmdc-schema --rm -it -v "$(pwd):/src" -w /src nmdc-schema /bin/bash -c "hostname; whoami"
On second thought, I don't think the container contains all the dependencies of nmdc-schema
. For example, it doesn't have yq
installed.
Based on that new information, I think the Dockerfile
is a remnant of an unfinished attempt to create a Docker-based development environment for nmdc-schema
.
I want there to be a Docker-based development environment for nmdc-schema
. I'll try adding to the Dockerfile
with that in mind.
Here's a Dockerfile
I wrote that installs pandoc, yq, and Apache Jena (in addition to Poetry).
FROM python:3.9
WORKDIR /src
# Download and install pandoc.
RUN apt update && \
apt install -y \
pandoc
# Download and install yq.
# Reference: https://github.com/mikefarah/yq#install
RUN wget https://github.com/mikefarah/yq/releases/latest/download/yq_linux_amd64 -O /usr/bin/yq && \
chmod +x /usr/bin/yq
# Download and install Apache Jena.
# Reference: https://sparrowflights.blogspot.com/2012/12/how-to-install-jena-command-line-tools.html
RUN wget -P /downloads/tmp "https://dlcdn.apache.org/jena/binaries/apache-jena-4.9.0.zip"
RUN unzip /downloads/tmp/apache-jena-4.9.0.zip -d /downloads/apache-jena
ENV JENAROOT="/downloads/apache-jena/apache-jena-4.9.0"
ENV PATH="$JENAROOT/bin:$PATH"
RUN pip install poetry
ADD . /src/
RUN poetry install
CMD ["echo", "Hello and goodbye."]
After building the Docker image from that Dockerfile, I instantiate a container from it by issuing this command:
docker run --name nmdc-schema --rm -it -v "$(pwd):/src" nmdc-schema /bin/bash
Do you imagine that people would use the container interactively, by gaining access to it's shell? Or add more commands to the Dockerfile, like make squeaky-clean all test validate-filtered-request-all make-rdf
?
Could we use a wrapper like this run.sh from the Environment Ontology?
And let's make sure we have an idea of the motivating use cases for running nmdc-schema code
I imagine people using it either (a) interactively (via $ docker run -it ... bash
); or (b) via docker run ... some-command
, where they specify the single command they want to run within it (e.g. $ docker run ... make make-rdf
).
I plan to use Docker Compose to "wrap" the volume-mounting commands and anything else that would otherwise be manually specified via $ docker run ...
; as opposed to using a custom wrapper/shell script.
The end user could then run $ docker compose up
to spin up the Docker container; and, for example, $ docker compose exec ... make make-rdf
to run one-off commands within it.
My motivation for creating the Dockerfile was to be able to run the various make
commands specified in the repo (related to running the migration code), without installing the repo's dependencies directly on my local machine (some of them are things I am not familiar with; e.g. Apache Jena).
I'm also open to submitting the Dockerfile to the upstream LinkML cookie-cutter repo, but I'm not prepared to discuss that with its maintainers since I'm not familiar with the cookie-cutter repo except that it was used to create this repo.
There is a
Dockerfile
in the root folder of the repo. It contains:Other than in the name of that file, I don't see any occurrences to the string "
docker
" in the repo.I recommend documenting the file somewhere in the repo.