icecc-scheduler - Does not handle iceccd disconnecting

dioguerra commented 5 years ago

icecc-scheduler does not handle very well iceccd disconnecting suddently. This causes the compiler to halt assembly distribution if the iceccd clients (few) disconnect suddently. As i understand this happens because:

Many of the worker nodes are still registered to the scheduler.
Scheduler tries to send work job to an unexistent work node
Job fails
Job is run locally

In my env. (Kubernetes) this is caused because of the workers waiting for new Assemble instruction while scheduler is linking objects. So the worker nodes are killed by the Pod autoscaler

With https://github.com/icecc/icecream/issues/482 this would be mitigated as workers can offload the scheduler

llunak commented 5 years ago

Please provide more details on how to reproduce the problem (or what the problem is exactly). I've just finished a testing build where I disconnected a node while it was running builds, and the build finished successfully by retrying those builds locally (which is good enough, given that this is an exceptional situation).

dioguerra commented 5 years ago

In your environment might not make sense, but in a cluster with Auto-scaling of worker nodes (where they are created and deleted) this has impact.

HOW TO REPRODUCE: First build your image

FROM fedora:31

# Build environment
RUN dnf install icecream -y && \
 dnf install clang -y && \
 dnf install doxygen -y && \
 dnf install gcc -y && \
 dnf install graphviz -y && \
 dnf install libasan -y && \
 dnf install libasan-static -y && \
 dnf install libedit-devel -y && \
 dnf install libxml2-devel -y && \
 dnf install make -y && \
 dnf install net-tools -y && \
 dnf install python-devel -y && \
 dnf install swig -y && \
 dnf install git bc xz -y

RUN dnf group install "Development Tools" -y && \
 dnf group install "C Development Tools and Libraries" -y && \
 dnf install cmake ninja-build ncurses-devel bison flex elfutils-libelf-devel openssl-devel -y

# Run icecc daemon in verbose mode
#ENTRYPOINT ["iceccd","-v"]
#ENTRYPOINT ["icecc-scheduler","-v"]

# iceccd port
EXPOSE 10245 8765/TCP 8765/UDP 8766

# If no-args passed, make very verbose
#CMD ["-vvv"]

Then, create the resources on your cluster. If you don't have access to a kubernetes clustes you can use your local computer with minikube. Though the per pod resource request and limitation should be a fraction of your available CPU

apiVersion: v1
kind: Namespace
metadata:
   name: division

---

apiVersion: apps/v1
kind: Deployment
metadata:
  name: icecc-division-scheduler
  namespace: division
  labels:
    app: icecc
spec:
  replicas: 1
  selector:
    matchLabels:
      app: icecc-scheduler-division-user
  template:
    metadata:
      annotations:
        cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
      labels:
        app: icecc-scheduler-division-user
    spec:
      hostname: scheduler-division-user
      containers:
      - name: icecc-scheduler-division-user
        image: icecc:fedora31
        command:
        - /bin/bash
        - -c
        - "iceccd -vvv -m 8 -d && icecc-scheduler -vvv"
        # args:
        # -
        # - -n ICECREAM
        # - -l /dev/stdout
        env:
        - name: ICECREAM_SCHEDULER_LOG_FILE
          value: "/dev/stdout"
        - name: ICECREAM_MAX_JOBS
          value: "3"
        - name: ICECREAM_NETNAME
          value: "division-user"
        resources:
          limits:
            cpu: 8
            memory: 8Gi
        ports:
        # Daemon computers
        - containerPort: 10245
        # Scheduler computer
        - containerPort: 8765
        # broadcast to find the scheduler (optional)
        - containerPort: 8765
          protocol: UDP
        # telnet interface to the scheduler (optional)
        - containerPort: 8766

---

apiVersion: v1
kind: Service
metadata:
  labels:
    app: icecc-scheduler-division-user
  name: icecc-division-scheduler
  namespace: division
spec:
  ports:
  - port: 10245
    name: daemon
    protocol: TCP
    targetPort: 10245
  - port: 8765
    name: scheduler
    protocol: TCP
    targetPort: 8765
  - port: 8765
    name: broadcast
    protocol: UDP
    targetPort: 8765
  - port: 8766
    name: telnet
    protocol: TCP
    targetPort: 8766
  selector:
    app: icecc-scheduler-division-user

---

apiVersion: apps/v1
kind: Deployment
metadata:
  name: icecc-division-worker
  namespace: division
  labels:
    app: icecc
spec:
  replicas: 1
  selector:
    matchLabels:
      app: icecc-worker-division-user
  template:
    metadata:
      labels:
        app: icecc-worker-division-user
    spec:
      containers:
      - name: icecc-worker-division-user
        image: icecc:fedora31
        command:
        - /bin/bash
        args:
        - -c
        - "iceccd -vvv -m 1 -s $(host -4 icecc-division-scheduler  | awk '{print $4}')"
        # - "iceccd -vvv -m 1 -s 192.168.1.68"
        env:
        - name: ICECREAM_LOG_FILE
          value: "/dev/stdout"
        - name: ICECREAM_MAX_JOBS
          value: "3"
        - name: ICECREAM_NETNAME
          value: "division-user"
        # - name: ICECREAM_SCHEDULER_HOST
        #   value: icecc-division-scheduler.division.svc.cluster.local
        resources:
          requests:
            cpu: 1
          limits:
            cpu: 1
        ports:
        # Daemon computers
        - containerPort: 10245
        # Scheduler computer
        - containerPort: 8765
        # broadcast to find the scheduler (optional)
        - containerPort: 8765
          protocol: UDP
        # telnet interface to the scheduler (optional)
        - containerPort: 8766

---

apiVersion: v1
kind: Service
metadata:
  labels:
    app: icecc-worker-division-user
  name: icecc-division-worker
  namespace: division
spec:
  ports:
  - port: 10245
    name: daemon
    protocol: TCP
    targetPort: 10245
  - port: 8765
    name: scheduler
    protocol: TCP
    targetPort: 8765
  - port: 8765
    name: broadcast
    protocol: UDP
    targetPort: 8765
  - port: 8766
    name: telnet
    protocol: TCP
    targetPort: 8766
  selector:
    app: icecc-worker-division-user

---

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: icecc-division-worker
  namespace: division
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: icecc-division-worker
  metrics:
  - type: Resource
    resource:
      name: cpu
      targetAverageUtilization: 90
  minReplicas: 1
  maxReplicas: 15

The way to trigger job build is by entering your scheduler pod:

kubectl -n division exec -it pod/icecc-itcm-scheduler-<59f747bfb7-kp2g2> -- bash

and execute your compiler. You can test with the instructions bellow:

git clone https://github.com/llvm/llvm-project.git ~/dev/llvm-project

mkdir -p ~/dev/llvm-builds/release-gcc-distcc

cd ~/dev/llvm-builds/release-gcc-distcc

export CC="/usr/bin/gcc"
export CXX="/usr/bin/g++"

cmake ~/dev/llvm-project/llvm \
  -G Ninja \
  -DCMAKE_BUILD_TYPE=Release \
  -DLLVM_USE_LINKER=gold \
  -DLLVM_ENABLE_PROJECTS="lldb;clang;lld" \
  -DCMAKE_EXPORT_COMPILE_COMMANDS=1 \
  -DCMAKE_C_COMPILER_LAUNCHER="icecc " \
  -DCMAKE_CXX_COMPILER_LAUNCHER="icecc "

time ninja -j 100

dioguerra commented 5 years ago

I would say that, the server should try to reschedule again N times on worker nodes AND maintain a tighter health check status of the attached worker pods

dioguerra commented 5 years ago

Handling the Assembly job on the scheduler is OK if one of the pods fails, but if multiple are scaled down at the same time for lack of work (3 or 5 which corresponds to 3 to 5 cores in this example) the compilation job migrates every task to the scheduler. Thus not using the still available (not scaled) worker nodes

llunak commented 5 years ago

Sorry, but I find this setup to be so niche and the possible gain so small that the effort/gain ratio is way beyond reasonable for me. Feel free to improve the handling and submit patches.

llunak commented 4 years ago

Closing, as per my above comment.

martin31821 commented 2 years ago

@dioguerra did you get it solved? We're looking into running icecc on kubernetes with a cluster-autoscaler setup, and I was wondering if you managed to find a better solution?

IMO what would be good is the following approach (I haven't checked what of this is actually implemented, so parts of it might not make sense)

In the worker, handle SIGTERM
Once SIGTERM is caught, send a packet to the scheduler that 'deregisters' the worker, so that the worker is still visible to scheduler and monitoring, but won't take up new jobs.
In the worker, finish all jobs.
Once all jobs are finished, terminate the actual worker.

The will help in two scenarios:

Autoscaling Down:
- We have a kubernetes cluster, where iceccd is deployed as Deployment with a HPA attached (much like in the above comments)
- This automatically launches new instances of the iceccd.
- Once kubernetes realizes it does not have enough CPU/Mem Resources to schedule a new instance, the cluster-autoscaler requests a new interruptible node from the cloud provider.
- Once compilejobs are done, Pod Autoscaler terminates some of the worker pods, which causes the same problem outlined above.

Second scenario:

We're compiling under full load, the cloud provider interrupts our compile node. Usually for this scenario, you get a two to five minute time window in which you might terminate the running workloads.
In this case, also a clean termination is desired.

dioguerra commented 2 years ago

Hey @marscher ,

to be honest i didn't pursue this further as i had no time, this was a side project and motivation ran low. I didnt think on using the SIGTERM signal tho, thats a good idea.

If you get something to work, can you keep me posted?

icecc / icecream

icecc-scheduler - Does not handle iceccd disconnecting #483