Find the nicer way to initialize off-the-shelf system in microservice architecture

tloubrieu-jpl commented 2 years ago

💪 Motivation

...so we can have a strategy for future developments

📖 Additional Details

As a use case, we want a consistent way to initialize rabbitmq and elasticsearch for the registry application.

⚙️ Engineering Details

The options (to be investigated and extended) are:

have init-docker containers
extend the existing docker images (of rabbitmq for example and add the code which creates the needed queues in the docker image)
have the initialization code called by the microservice component (e.g. the harvest service creates the rabbitmq queue needed if they don't exist yet)

A good example of initialization we want to manage is the creation of a database schema and users.

tloubrieu-jpl commented 2 years ago

@nutjob4life @tdddblog @ramesh-maddegoda can you write your inputs on this topics as comment in the ticket ?

nutjob4life commented 2 years ago

Multi-stage builds can be helpful here, letting you make an image that has not just the service but also the pre-configured data (database schema, expected records, etc.).

Here's an example Dockerfile that uses a multi-stage build that lets you make a new image for a hypothetical database called Persistence™ that's pre-loaded with a schema as well as database rows:

# Stage 1
# =======
#
# We start with the database image, in this case a hypothetical database
# called "persistence"

FROM persistenece:1.2.3 AS initialization

# Set up some defaults for this database

ENV PERSISTENCE_USERNAME="db"
ENV PERSISTENCE_PASSWORD="p455w0rd"

# This database expects initial schema to be in /var/persistence/init.d and
# initial data in /var/persistence/load.d

COPY etc/product-schema.sql etc/label-schema.sql /var/persistence/init.d/
COPY data/*.sql /var/persistence/load.d/
COPY data/blobs/ /var/persistence/load.d/lobs/

# This database has a special command-line option that tells it not to start
# up as a daemon process. Other databases may require you to modify the
# Docker entrypoint script with /usr/bin/sed or by forcing us to provde
# an alternative starutp scrupt (this turns out to be fairly common):

RUN : &&\
    /usr/local/bin/persistence-entrypoint.sh --load-only /tmp/dump &&\
    :

# For example, if this were Postgres, we'd do
#
# RUN : &&\
#     sed --in-place --expression='s/exec \"$@\"//' /usr/local/bin/docker-entrypoint.sh &&\
#     /usr/local/bin/docker-entrypoint.sh postgres &&\
#     :
#
# Solr has a special command-line option like `solr-create` which does
# something similar.

# Stage 2
# =======
#
# We go back to "persistence" again

FROM persistence:1.2.3

# But this time we can copy over the service's database files

COPY --from=initialization /tmp/dump /var/persistence/db

# There's no need for `RUN rm -rf/ /tmp/dump`. It doesn't exist in this layer!
# It's only in the `initialization` layer.

There's no special syntax needed to build this image; a single build command does it:

docker image build --tag pds-persistence .

ramesh-maddegoda commented 2 years ago

These are 2 approaches that I currently use to initialize Elasticsearch and RabbitMQ in docker compose.

To initialize Elasticsearch, I created an elasticsearch-init service in docker compose as follows.

# Initializes Elasticsearch by creating registry and data dictionary indices by utilizing the Registry Loader
  init-elasticsearch:
    profiles: ["elastic", "big-data", "big-data-integration-test"]
    image: ${REG_LOADER_IMAGE}
    environment:
      - ES_URL=${ES_URL}
    volumes:
      - ./scripts/init-elasticsearch.sh:/usr/local/bin/init-elasticsearch.sh
    networks:
      - pds
    entrypoint: ["bash", "/usr/local/bin/init-elasticsearch.sh"]

After that, in the init-elasticsearch.sh (which is called in the endpoint above), I added the following code to wait for Elasticsearch.

# Check if the ES_URL environment variable is set
if [ -z "$ES_URL" ]; then
    echo "Error: 'ES_URL' (Elasticsearch URL) environment variable is not set. Use docker's -e option." 1>&2
    exit 1
fi

echo "Waiting for Elasticsearch to launch..."  1>&2
while ! curl --output /dev/null --silent --head --fail "$ES_URL"; do
  sleep 1
done

echo "Creating registry and data dictionary indices..." 1>&2
registry-manager create-registry -es "$ES_URL"

With above approach, we can wait for any service (which has a post to be opened) and execute a script once the service is available.

To initialize RabbitMQ, I used a RabbitMQ definition file and initialized RabbitMQ in docker compose as follows.

# Starts RabbitMQ
  rabbit-mq:
    profiles: ["rabbitmq", "big-data", "big-data-integration-test"]
    image: rabbitmq:3.9-management
    ports:
      - "15672:15672"
      - "5672:5672"
    volumes:
      - ./default-config/rabbitmq.conf:/etc/rabbitmq/rabbitmq.conf:ro
      - ./default-config/rabbitmq-definitions.json:/etc/rabbitmq/definitions.json:ro
    networks:
      - pds

I think similar approaches can be used to initialize any micro-service, when there is a port number to wait for available.

tloubrieu-jpl commented 2 years ago

Thanks @nutjob4life @ramesh-maddegoda , I also like the 3rd approach (have the initialization code called by the microservice component (e.g. the harvest service creates the rabbitmq queue needed if they don't exist yet) because that makes the one who knows what is needed to create it, instead of creating tables or queue in one repository, and using them in another one.

What @nutjob4life proposes makes us create/maintain specific images... that might be what we need, I don't know.

tloubrieu-jpl commented 2 years ago

We could also think of using a message broker to make the components aware of their status.

But for a general approach we decide to use specialized docker images as proposed by Sean (option 2).

tloubrieu-jpl commented 2 years ago

@nutjob4life wrote on slack: just want to temper what I said in breakout. My idea of intermediate builds really only applies if we are distributing re-usable images intended for general consumption if what we are doing is just saying "we need service x" "we need service y" and these are listed in compose then the extra overhead is maybe not worth it

tloubrieu-jpl commented 2 years ago

That will not be fixed for this build, no conclusion from our team of experts.

nasa-pds-engineering-node / pds-registry-app