Open tloubrieu-jpl opened 2 years ago
@nutjob4life @tdddblog @ramesh-maddegoda can you write your inputs on this topics as comment in the ticket ?
Multi-stage builds can be helpful here, letting you make an image that has not just the service but also the pre-configured data (database schema, expected records, etc.).
Here's an example Dockerfile
that uses a multi-stage build that lets you make a new image for a hypothetical database called Persistence™ that's pre-loaded with a schema as well as database rows:
# Stage 1
# =======
#
# We start with the database image, in this case a hypothetical database
# called "persistence"
FROM persistenece:1.2.3 AS initialization
# Set up some defaults for this database
ENV PERSISTENCE_USERNAME="db"
ENV PERSISTENCE_PASSWORD="p455w0rd"
# This database expects initial schema to be in /var/persistence/init.d and
# initial data in /var/persistence/load.d
COPY etc/product-schema.sql etc/label-schema.sql /var/persistence/init.d/
COPY data/*.sql /var/persistence/load.d/
COPY data/blobs/ /var/persistence/load.d/lobs/
# This database has a special command-line option that tells it not to start
# up as a daemon process. Other databases may require you to modify the
# Docker entrypoint script with /usr/bin/sed or by forcing us to provde
# an alternative starutp scrupt (this turns out to be fairly common):
RUN : &&\
/usr/local/bin/persistence-entrypoint.sh --load-only /tmp/dump &&\
:
# For example, if this were Postgres, we'd do
#
# RUN : &&\
# sed --in-place --expression='s/exec \"$@\"//' /usr/local/bin/docker-entrypoint.sh &&\
# /usr/local/bin/docker-entrypoint.sh postgres &&\
# :
#
# Solr has a special command-line option like `solr-create` which does
# something similar.
# Stage 2
# =======
#
# We go back to "persistence" again
FROM persistence:1.2.3
# But this time we can copy over the service's database files
COPY --from=initialization /tmp/dump /var/persistence/db
# There's no need for `RUN rm -rf/ /tmp/dump`. It doesn't exist in this layer!
# It's only in the `initialization` layer.
There's no special syntax needed to build this image; a single build command does it:
docker image build --tag pds-persistence .
These are 2 approaches that I currently use to initialize Elasticsearch and RabbitMQ in docker compose.
To initialize Elasticsearch,
I created an elasticsearch-init
service in docker compose as follows.
# Initializes Elasticsearch by creating registry and data dictionary indices by utilizing the Registry Loader
init-elasticsearch:
profiles: ["elastic", "big-data", "big-data-integration-test"]
image: ${REG_LOADER_IMAGE}
environment:
- ES_URL=${ES_URL}
volumes:
- ./scripts/init-elasticsearch.sh:/usr/local/bin/init-elasticsearch.sh
networks:
- pds
entrypoint: ["bash", "/usr/local/bin/init-elasticsearch.sh"]
After that, in the init-elasticsearch.sh
(which is called in the endpoint above), I added the following code to wait for Elasticsearch.
# Check if the ES_URL environment variable is set
if [ -z "$ES_URL" ]; then
echo "Error: 'ES_URL' (Elasticsearch URL) environment variable is not set. Use docker's -e option." 1>&2
exit 1
fi
echo "Waiting for Elasticsearch to launch..." 1>&2
while ! curl --output /dev/null --silent --head --fail "$ES_URL"; do
sleep 1
done
echo "Creating registry and data dictionary indices..." 1>&2
registry-manager create-registry -es "$ES_URL"
With above approach, we can wait for any service (which has a post to be opened) and execute a script once the service is available.
To initialize RabbitMQ, I used a RabbitMQ definition file and initialized RabbitMQ in docker compose as follows.
# Starts RabbitMQ
rabbit-mq:
profiles: ["rabbitmq", "big-data", "big-data-integration-test"]
image: rabbitmq:3.9-management
ports:
- "15672:15672"
- "5672:5672"
volumes:
- ./default-config/rabbitmq.conf:/etc/rabbitmq/rabbitmq.conf:ro
- ./default-config/rabbitmq-definitions.json:/etc/rabbitmq/definitions.json:ro
networks:
- pds
I think similar approaches can be used to initialize any micro-service, when there is a port number to wait for available.
Thanks @nutjob4life @ramesh-maddegoda , I also like the 3rd approach (have the initialization code called by the microservice component (e.g. the harvest service creates the rabbitmq queue needed if they don't exist yet) because that makes the one who knows what is needed to create it, instead of creating tables or queue in one repository, and using them in another one.
What @nutjob4life proposes makes us create/maintain specific images... that might be what we need, I don't know.
We could also think of using a message broker to make the components aware of their status.
But for a general approach we decide to use specialized docker images as proposed by Sean (option 2).
@nutjob4life wrote on slack: just want to temper what I said in breakout. My idea of intermediate builds really only applies if we are distributing re-usable images intended for general consumption if what we are doing is just saying "we need service x" "we need service y" and these are listed in compose then the extra overhead is maybe not worth it
That will not be fixed for this build, no conclusion from our team of experts.
💪 Motivation
...so we can have a strategy for future developments
📖 Additional Details
As a use case, we want a consistent way to initialize rabbitmq and elasticsearch for the registry application.
⚙️ Engineering Details
The options (to be investigated and extended) are:
A good example of initialization we want to manage is the creation of a database schema and users.