huawei-cloudnative / firmament

The Firmament cluster scheduling platform
http://www.firmament.io
Apache License 2.0
19 stars 9 forks source link

Some questions about cost models. #42

Closed chauncey-77 closed 3 years ago

chauncey-77 commented 4 years ago
  1. Are all cost models meeting the CPU/MEM requirements first? Or just COST_MODEL_CPU(10) can satisfy?
  2. I see there is COST_MODEL_QUINCY_INTERFERENCE(9) cost model in the code (here line 41), but It seems that this is not mentioned in the document here.So can this cost model work well or not?
  3. Under the premise of meeting the CPU/MEM requirements, if I want to have the best performance such as balance or utilization, which cost model should I choose?

please tell me, thanks.

deepak-vij commented 4 years ago

As I mentioned in my response to your other issue that all of our Poseidon/Firmament work was using our brand new multi-dimensional CPU-Memory cost model. This was primarily to do apple versus apple comparisons with Kubernetes. All other original Firmament cost models are all single dimensional.

Typically, Firmament models the scheduling problem as a flow network optimization and enforces the updated fair shares whenever its solver runs - preempting running tasks to enforce fair shares etc. But at this time we did not spend much time on the fairness aspect of resource scheduling. Our multi-dimensional model at the present time was strictly resource utilization based.

chauncey-77 commented 4 years ago

Thanks for your answer. One more question, how is resource utilization defined here? thanks.

deepak-vij commented 4 years ago

Utilization is what you would specify as the resource (CPU/mem) requirements at the Pod spec. level (resource reservations). Initially, we had the actual resource utilization working too using Heapster. But it is broken now as we started paying attention to rest of the functionality. It should be easily doable. Our design document reflects that using cAdvisor/prometheus etc.

chauncey-77 commented 4 years ago

thanks. Maybe the utilization is computed by some formula, so how you define the formula?

deepak-vij commented 4 years ago

Firmament controller code comes up with a Cost Vector for an arc between two nodes based on the proportions of CPU & Memory values in the Pod Spec., if this is what you meant.

deepak-vij commented 4 years ago

Nothing fancy, as such. Essentially, determine a cost vector based on the multi-dimensional CPU & Memory cost values for a workload (pod in our case).

chauncey-77 commented 4 years ago

It's so kind of you. Please one more question. Before, I run the firmament deployment using the .yaml file url, and then run the poseidon deployment, and it works well. This firmament scheduler is simple scheduler. Now, I want to use the firmament flow scheduler, so I run the firmament deployment using a .yaml file, and then run the poseidon deployment, but something wrong. The content of the firmament .yaml file is as follow, and it's almost the same as before except the command :

kind: Service
apiVersion: v1
metadata:
  name: firmament-service
  namespace: kube-system
spec:
  selector:
    scheduler: firmament
  ports:
    - protocol: TCP
      port: 9090
      targetPort: 9090
---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: firmament-scheduler
  namespace: kube-system
  labels:
    scheduler: firmament
spec:
  replicas: 1
  template:
    metadata:
      labels:
        scheduler: firmament
    spec:
      containers:
      # - command: [/firmament/build/src/firmament_scheduler, --flagfile=/firmament/config/firmament_scheduler_cpu_mem.cfg]
      - command: [/firmament/build/src/coordinator, --scheduler=flow, --flow_scheduling_cost_model=10, --listen_uri=tcp:localhost:9998, --task_lib_dir=$(pwd)/build/src]
        image: huaweifirmament/firmament:latest
        name: firmament-scheduler
        ports:
         - containerPort: 9090
      hostNetwork: true
      hostPID: false
      volumes: []

and the logs of poseidon container says:

I0501 02:28:32.869001       1 config.go:190] ReadFromCommandLineFlags{poseidon firmament-service.kube-system  1.15 0.0.0.0:9091 10 9090 . false 0.0.0.0:8989 0.0.0.0:8989 0.0.0.0:8989 500 1000 false false}
W0501 02:28:32.869531       1 config.go:133] Config File "poseidon_config" Not Found in "[/]"unable to read poseidon_config, using command flags/default values
I0501 02:28:32.884621       1 k8sclient.go:104] k8sclient init called
I0501 02:28:32.886892       1 poseidon.go:117] Starting Poseidon with firmament address firmament-service.kube-system:9090.
panic: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 10.68.7.23:9090: connect: connection refused"

it seems like a connection error, but there was no connection problem before. I don't know what should i do. Please help me, thanks.

chauncey-77 commented 4 years ago

@deepak-vij

chauncey-77 commented 4 years ago

I find that it's maybe because the firmament-service is not accessible. When I try to access this service from a pod using curl command, it returns : curl: (7) Failed to connect to 10.68.7.23 port 9090: Connection refused. Then I try to change the --listen_uri parameter's value. I tried several values:

  1. tcp:localhost:9998
  2. tcp:0.0.0.0:9998
  3. tcp:0.0.0.0:9090
  4. tcp:0.0.0.0:9091
  5. tcp:0.0.0.0:8080

but they can't finger out. Is there any consideration for the --listen_uri parameter value. @deepak-vij Thanks.

chauncey-77 commented 4 years ago

@shivramsrivastava @deepak-vij Can anybody answer the question? Or Is there any doc about how to deploy firmament flow scheduler in cluster, like some explanation or a yaml file like this here, which is order to deploy firmament simple scheduler in cluster? Thanks.

deepak-vij commented 4 years ago

https://github.com/camsas/firmament/blob/master/README.md#building-instructions. This is all documented under Developer guide document link as below.

https://github.com/kubernetes-sigs/poseidon/blob/master/docs/devel/README.md