Closed DavyLandman closed 4 years ago
Also see typhon-project/typhon-polystore-api#11
Agree that we should think about having such a mechanism. But I think the API has to define what is necessary and then we could all think about how to realise this
Hi, we were discussing service discovery and propose the following: Figure 1: Connections of polystore components
Terms that are used below:
Both Docker networks and Kubernetes clusters provide sufficient DNS services. In Docker Compose, the typhon component's name and the default port are enough to find the right container. In Kubernetes each typhon component will have something like a proxy (it's called Service in Kubernetes). It will have the typhon component's name and the default port as address and will forward all requests to the right pod.
So service discovery will be identical in Docker Compose and Kubernetes from the typhon component's view.
The API will know all addresses from the DL model. Example docker-compose.yaml:
version: '3.7'
services:
VehicleMetadataDB:
image: mariadb:latest
environment:
MYSQL_ROOT_PASSWORD: password
VehicleDataDB:
image: mongo:latest
environment:
MONGO_INITDB_ROOT_USERNAME: username
MONGO_INITDB_ROOT_PASSWORD: password
polystore-mongo:
image: mongo:latest
environment:
MONGO_INITDB_ROOT_USERNAME: admin
MONGO_INITDB_ROOT_PASSWORD: admin
MONGO_INITDB_DATABASE: admin
volumes:
- ./models/:/docker-entrypoint-initdb.d
typhon-polystore-service:
image: clms/typhon-polystore-api:latest
depends_on:
- polystore-mongo
ports:
- published: 8080
target: 8080
typhonql-server:
image: swatengineering/typhonql-server
container_name
, hostname
and ports
are not needed anymore. @OrfenCLMS we could add the default port to the model tho, so you don't have to add them. This would also allow the user to use a custom database (still mysql or mongo) image with maybe a different port exposed.
@DavyLandman Forwarding requests from the QL component has to be organized, so that database consistency is ensured. Since @5h4m4n has experience in using Kubernetes, hopefully he can give some insight concerning this matter.
I'm glad that container_name
and friends are dropping.
I had the following questions when I registered this issue. I'll try to iterate them to see if I understand the proposed solution.
For me the answers to question 2 and 4 are a bit less clear still, the rest seems clear. Did I miss something?
Thinking a bit more, I want to avoid circular dependencies on the inside. Like the API must depend on QL, so I want to avoid QL depending on the API service. Currently (for point2 & 3 & 4) we have to check with API to know the ML & DL model. Now API could forward it on change and QL could store it, but how about container restart, API doesn't know we have forgotten (this is discussed in typhon-project/typhonql#68). When QL is running inside a k8s Service, we cannot depend on API sending us an updated model, as k8s will only forward it to one of the pods.
So I thought it might be nice to have a small component in the middle (maybe a small wrapper around the mongodb) that everyone can call, but itself calls nobody.
@MarieSaphira I think this design makes the most sense. Kubernetes and Docker both certainly provide enough DNS Services to account for the use case here. Removing container_name
, hostname
and port
also adjusts the use case completely to the stateless behavior of both tools (Docker Compose/Engine and Kubernetes).
@DavyLandman As far as point 4 goes for scaling out stateful services like mongodb and maria, we would want to avoid a situation where these databases are killed and data loss would result. There are 3 ways to solve this: 1) master slave scenarios 2) clustered versions 3) Networked mounts for the data (not recommended, some data loss might take place unless you have a queueing mechanism in the middle before inserting, which increases complexity significantly)
For case 1) you would have to write custom kubernetes/docker behavior to promote the slaves to master in case of failure. It does add some overhead in terms of development.
For case 2) which I think is the most robust scenario for enterprise use cases of this platform, we could work to produce configurations that bring up clusters of 3 containers for each db. This way entering through 1 service component (or proxy call it what you will) will always forward data requests to a cluster which is sitting behind and the internal mechanisms of the cluster will take care of resharding and distributing the data in case of machine failures.
concerning 2: We are working on that. The idea is to have the components in the DL model even when they are outside of the polystore. Maybe a container outside the polystore could look like this:
external container typhon-analytics : Docker {
address = https://AWS.typhon-analytics:9092;
This won't produce any scripts and the API would know about it.
The polystore API needs to find the QL Server, and provide it with a set of domain names of running servers. This is already 2 service discovery actions, firstly finding the QL server, secondly providing the other databases.
I'm aware how this works with docker (-compose) but does this change in k8s? Maybe this is something more suited for @OrfenCLMS, but there is an overlap here.
I know there are quite some service discovery mechanism like for example CoreDNS, but I'm unclear how we avoid to become very dependent on such a thing. Currently it is nice that everything also just works inside a local docker install.