How should we do service discovery in a container runtime agnostic way?

DavyLandman commented 4 years ago

The polystore API needs to find the QL Server, and provide it with a set of domain names of running servers. This is already 2 service discovery actions, firstly finding the QL server, secondly providing the other databases.

I'm aware how this works with docker (-compose) but does this change in k8s? Maybe this is something more suited for @OrfenCLMS, but there is an overlap here.

I know there are quite some service discovery mechanism like for example CoreDNS, but I'm unclear how we avoid to become very dependent on such a thing. Currently it is nice that everything also just works inside a local docker install.

MarieSaphira commented 4 years ago

Also see typhon-project/typhon-polystore-api#11

schlotze commented 4 years ago

Agree that we should think about having such a mechanism. But I think the API has to define what is necessary and then we could all think about how to realise this

MarieSaphira commented 4 years ago

Hi, we were discussing service discovery and propose the following: ConnectionsPolystore Figure 1: Connections of polystore components

Terms that are used below:

typhon component : e.g. typhon-polystore-service, typhonql-server, userdb, ...
default port : the EXPOSEd port defined in the Dockerfile of an image

Both Docker networks and Kubernetes clusters provide sufficient DNS services. In Docker Compose, the typhon component's name and the default port are enough to find the right container. In Kubernetes each typhon component will have something like a proxy (it's called Service in Kubernetes). It will have the typhon component's name and the default port as address and will forward all requests to the right pod.

So service discovery will be identical in Docker Compose and Kubernetes from the typhon component's view.

The API will know all addresses from the DL model. Example docker-compose.yaml:

version: '3.7'

services:
  VehicleMetadataDB:
    image: mariadb:latest
    environment:
      MYSQL_ROOT_PASSWORD: password
  VehicleDataDB:
    image: mongo:latest
    environment:
      MONGO_INITDB_ROOT_USERNAME: username
      MONGO_INITDB_ROOT_PASSWORD: password
  polystore-mongo:
    image: mongo:latest
    environment:
      MONGO_INITDB_ROOT_USERNAME: admin
      MONGO_INITDB_ROOT_PASSWORD: admin
      MONGO_INITDB_DATABASE: admin
    volumes:
      - ./models/:/docker-entrypoint-initdb.d
  typhon-polystore-service:
    image: clms/typhon-polystore-api:latest
    depends_on: 
      - polystore-mongo
    ports:
      - published: 8080
        target: 8080
  typhonql-server:
    image: swatengineering/typhonql-server

container_name, hostname and ports are not needed anymore. @OrfenCLMS we could add the default port to the model tho, so you don't have to add them. This would also allow the user to use a custom database (still mysql or mongo) image with maybe a different port exposed.

@DavyLandman Forwarding requests from the QL component has to be organized, so that database consistency is ensured. Since @5h4m4n has experience in using Kubernetes, hopefully he can give some insight concerning this matter.

DavyLandman commented 4 years ago

I'm glad that container_name and friends are dropping.

I had the following questions when I registered this issue. I'll try to iterate them to see if I understand the proposed solution.

How do "always there" services know to find each other? Answer seems to be fixed names & ports. Looks good to me.
How can we find optional services? Like where is it stored what the URI of the Analytics component is, or the NLP one? In figure 1 it can be running either inside or outside the polystore environment.
How can we find ML specific services? (mongo db and friends that the user chose in ML & DL). Answer seems to be the API knows about this (it currently does as well) and is responsible for telling this. So as a consumer you can either call the API endpoint or API sends it along (related to typhon-project/typhonql#68) when it calls you.
How can we find k8s scaled out services? Answer is to base on k8s Services, cool feature btw. I indeed have questions about how this works with statefull services (like mongodb & mariadb). Maybe @5h4m4n can help with that, l do not know how mariadb and mongodb (and upcomming neo4j & cassandra) function when they are automatically kill & restarted and are proxied behind a single frontend.

For me the answers to question 2 and 4 are a bit less clear still, the rest seems clear. Did I miss something?

DavyLandman commented 4 years ago

Thinking a bit more, I want to avoid circular dependencies on the inside. Like the API must depend on QL, so I want to avoid QL depending on the API service. Currently (for point2 & 3 & 4) we have to check with API to know the ML & DL model. Now API could forward it on change and QL could store it, but how about container restart, API doesn't know we have forgotten (this is discussed in typhon-project/typhonql#68). When QL is running inside a k8s Service, we cannot depend on API sending us an updated model, as k8s will only forward it to one of the pods.

So I thought it might be nice to have a small component in the middle (maybe a small wrapper around the mongodb) that everyone can call, but itself calls nobody.

5h4m4n commented 4 years ago

@MarieSaphira I think this design makes the most sense. Kubernetes and Docker both certainly provide enough DNS Services to account for the use case here. Removing container_name, hostname and port also adjusts the use case completely to the stateless behavior of both tools (Docker Compose/Engine and Kubernetes).

@DavyLandman As far as point 4 goes for scaling out stateful services like mongodb and maria, we would want to avoid a situation where these databases are killed and data loss would result. There are 3 ways to solve this: 1) master slave scenarios 2) clustered versions 3) Networked mounts for the data (not recommended, some data loss might take place unless you have a queueing mechanism in the middle before inserting, which increases complexity significantly)

For case 1) you would have to write custom kubernetes/docker behavior to promote the slaves to master in case of failure. It does add some overhead in terms of development.

For case 2) which I think is the most robust scenario for enterprise use cases of this platform, we could work to produce configurations that bring up clusters of 3 containers for each db. This way entering through 1 service component (or proxy call it what you will) will always forward data requests to a cluster which is sitting behind and the internal mechanisms of the cluster will take care of resharding and distributing the data in case of machine failures.

MarieSaphira commented 4 years ago

concerning 2: We are working on that. The idea is to have the components in the DL model even when they are outside of the polystore. Maybe a container outside the polystore could look like this:

external container typhon-analytics : Docker {
    address = https://AWS.typhon-analytics:9092;

This won't produce any scripts and the API would know about it.

typhon-project / typhondl

How should we do service discovery in a container runtime agnostic way? #24