ude-soco / elas-official

The Exploratory Learning Analytics Toolkit for Students (ELAS) is a platform to support UDE students in their learning activities. This platform is comprised of all the best projects at SoCo Group, where different LA applications were developed as part of student projects.
https://elas-official.soco.inko.cloud
MIT License
1 stars 15 forks source link

Inter-service connectivity between Eureka clients on K8s #85

Closed ralf-berger closed 10 months ago

ralf-berger commented 11 months ago

backend-2-api-gateway in K8s env complains:

[…] 500 Server Error for HTTP GET "/api/e3selector/e3-courses/"
io.netty.channel.ConnectTimeoutException: connection timed out: ⏎
  sc-elas-official-backend-1-service-registry.elas-official-edge.svc.cluster.local/10.43.193.175:8004
ralf-berger commented 10 months ago

Eureka introduces an indirection to inter-service connectivity by having the individual services register themselves with a service registry and reporting how they can be reached. So, instead of telling service A how to reach B, we now have to tell service B how to reach service B, which it will provide to the service registry S. Service A must also know how to reach service registry S to query that connection information (using a hard-coded service name).

The information on how each service should be connected to is usually out of the scope of each individual service. The individual applications will by default guess, based on the hostname their OS reports, and therefore, on Kubernetes, register with their Pod name, which is unsuitable. All backend-* services must instead get their own Service names injected and use that to register.

In the case of a Java/Spring app the host name to be reported is configured via eureka.instance.hostname (e. g. via parameter or config file).

With Python it seems to be passed as the instance_ip= and/or instance_host= parameter of the eureka_client.init() call. At the moment this doesn't seem to be implemented correctly, see e. g.:

https://github.com/ude-soco/elas-official/blob/9920c99052d0b1ea0e31de0c370cc361615b0bbf/backend/3-auth/server/settings.py#L46-L56

It looks like this snippet (equivalent in the other Python-based apps) uses the environment variable EUREKA_HOST_NAME as the host of the Eureka service registry and as its own host, that is reported to the service registry. Not sure why.

Note: None of the database connections use Eureka, those are injected as needed.

shoebjoarder commented 10 months ago

I have observed that there are some changes already made to the file, and I have the following suggestion below, based on the documentation:

try:
    EUREKA_HOST_NAME = os.environ.get("HOST")
    EUREKA_PORT = os.environ.get("EUREKA_PORT")
    INSTANCE_HOST = os.environ.get("INSTANCE_HOST")
    eureka_client.init(
        eureka_server=f"http://{EUREKA_HOST_NAME}:{EUREKA_PORT}/eureka",  # type: ignore
        app_name="ELAS-AUTH",
        instance_port=int(os.environ.get("DJANGO_PORT", "8002")),
        instance_host=INSTANCE_HOST,  # type: ignore
    )

The instance_ip=socket.gethostbyname(EUREKA_HOST_NAME) is not necessarily required, which can be removed. However, the instance_host property should be provided. If we are using the "host" networking driver, we can set the instance_host to the hostname of the host machine. However, if we are using the "bridge" driver (or a custom network), we should set the instance_host to the IP address of the container within the Docker network.

We have to keep in mind that we need to retrieve the IP address of the container and pass it to the application every time the container is started, and I guess in order to automate this process, we could modify the Dockerfile or our container orchestration configuration to automatically retrieve the container's IP address and set the INSTANCE_HOST environment variable.

ralf-berger commented 10 months ago

Passing hostnames under which they can be reached to each process and reporting it to Eureka has been implemented in 7b833d3, b25f378, f1a05bd.

There is no reason to resolve and pass IP addresses.

Compose

In Compose, the hostname is the service name. In this abbreviated example you can see that the HOST env var matches the name of the service:

services:
  backend-3-auth:
    image: socialcomputing/elas-official-backend-3-auth
    environment:
      PORT: 8002
      HOST: backend-3-auth
      EUREKA_PORT: 8761
      EUREKA_HOST_NAME: backend-1-service-registry

Kubernetes

In a Kubernetes environment, a Pod (a unit of one or more containers) is usually replicated more than once for throughput/redundancy. A Service provides network access to the replicas of such a Deployment.

E.g., within the cluster, all backend-3-auth replicas of the edge deployment are reachable under the hostname sc-elas-official-backend-3-auth.elas-official-edge.svc.cluster.local. This is injected into the container in a similar fashion:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: 3-auth
spec:
  template:
    spec:
      containers:
      - name: web
        image: socialcomputing/elas-official-backend-3-auth
        ports:
        - containerPort: 8002
        env:
        - name: PORT
          value: "8002"
        - name: HOST
          value: $(BACKEND_3_AUTH_SERVICE)
        - name: EUREKA_PORT
          value: "8761"
        - name: EUREKA_HOST_NAME
          value: $(BACKEND_1_SERVICE_REGISTRY_SERVICE)

Kubernetes service discovery can also be used in a less explicit way using environment variables. E. g. the following environment variables are set in all containers by default:

$ env
[...]
SC_ELAS_OFFICIAL_BACKEND_1_SERVICE_REGISTRY_SERVICE_HOST=10.43.193.175
SC_ELAS_OFFICIAL_BACKEND_2_API_GATEWAY_SERVICE_HOST=10.43.50.81
SC_ELAS_OFFICIAL_BACKEND_3_AUTH_SERVICE_HOST=10.43.137.164
SC_ELAS_OFFICIAL_BACKEND_4_E3SELECTOR_REDIS_SERVICE_HOST=10.43.178.128
SC_ELAS_OFFICIAL_BACKEND_4_E3SELECTOR_WEB_SERVICE_HOST=10.43.218.206
SC_ELAS_OFFICIAL_BACKEND_5_STUDYCOMPASS_NEO4J_SERVICE_HOST=10.43.154.192
SC_ELAS_OFFICIAL_BACKEND_5_STUDYCOMPASS_REDIS_SERVICE_HOST=10.43.82.120
SC_ELAS_OFFICIAL_BACKEND_5_STUDYCOMPASS_WEB_SERVICE_HOST=10.43.225.11
SC_ELAS_OFFICIAL_FRONTEND_SERVICE_HOST=10.43.131.39
[...]

Note

I don't know about the implementation details, but as services seem to register themselves with the Eureka server, I suspect they might also unregister when stopped. This should probably be disabled, since all replicas sit behind a shared host name and one instance unregistering would make additional ones unavailable?

ralf-berger commented 10 months ago

Additional fixes in 781db55 and 31e9d3f. Might actually work now.

ralf-berger commented 10 months ago

Confirmed connectivity to backend-3-auth and backend-4-e3selector.