Closed chauncey-77 closed 3 years ago
As I mentioned in my response to your other issue that all of our Poseidon/Firmament work was using our brand new multi-dimensional CPU-Memory cost model. This was primarily to do apple versus apple comparisons with Kubernetes. All other original Firmament cost models are all single dimensional.
Typically, Firmament models the scheduling problem as a flow network optimization and enforces the updated fair shares whenever its solver runs - preempting running tasks to enforce fair shares etc. But at this time we did not spend much time on the fairness aspect of resource scheduling. Our multi-dimensional model at the present time was strictly resource utilization based.
Thanks for your answer. One more question, how is resource utilization defined here? thanks.
Utilization is what you would specify as the resource (CPU/mem) requirements at the Pod spec. level (resource reservations). Initially, we had the actual resource utilization working too using Heapster. But it is broken now as we started paying attention to rest of the functionality. It should be easily doable. Our design document reflects that using cAdvisor/prometheus etc.
thanks. Maybe the utilization is computed by some formula, so how you define the formula?
Firmament controller code comes up with a Cost Vector for an arc between two nodes based on the proportions of CPU & Memory values in the Pod Spec., if this is what you meant.
Nothing fancy, as such. Essentially, determine a cost vector based on the multi-dimensional CPU & Memory cost values for a workload (pod in our case).
It's so kind of you. Please one more question. Before, I run the firmament deployment using the .yaml file url, and then run the poseidon deployment, and it works well. This firmament scheduler is simple scheduler. Now, I want to use the firmament flow scheduler, so I run the firmament deployment using a .yaml file, and then run the poseidon deployment, but something wrong. The content of the firmament .yaml file is as follow, and it's almost the same as before except the command :
kind: Service
apiVersion: v1
metadata:
name: firmament-service
namespace: kube-system
spec:
selector:
scheduler: firmament
ports:
- protocol: TCP
port: 9090
targetPort: 9090
---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: firmament-scheduler
namespace: kube-system
labels:
scheduler: firmament
spec:
replicas: 1
template:
metadata:
labels:
scheduler: firmament
spec:
containers:
# - command: [/firmament/build/src/firmament_scheduler, --flagfile=/firmament/config/firmament_scheduler_cpu_mem.cfg]
- command: [/firmament/build/src/coordinator, --scheduler=flow, --flow_scheduling_cost_model=10, --listen_uri=tcp:localhost:9998, --task_lib_dir=$(pwd)/build/src]
image: huaweifirmament/firmament:latest
name: firmament-scheduler
ports:
- containerPort: 9090
hostNetwork: true
hostPID: false
volumes: []
and the logs of poseidon container says:
I0501 02:28:32.869001 1 config.go:190] ReadFromCommandLineFlags{poseidon firmament-service.kube-system 1.15 0.0.0.0:9091 10 9090 . false 0.0.0.0:8989 0.0.0.0:8989 0.0.0.0:8989 500 1000 false false}
W0501 02:28:32.869531 1 config.go:133] Config File "poseidon_config" Not Found in "[/]"unable to read poseidon_config, using command flags/default values
I0501 02:28:32.884621 1 k8sclient.go:104] k8sclient init called
I0501 02:28:32.886892 1 poseidon.go:117] Starting Poseidon with firmament address firmament-service.kube-system:9090.
panic: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 10.68.7.23:9090: connect: connection refused"
it seems like a connection error, but there was no connection problem before. I don't know what should i do. Please help me, thanks.
@deepak-vij
I find that it's maybe because the firmament-service is not accessible. When I try to access this service from a pod using curl command, it returns : curl: (7) Failed to connect to 10.68.7.23 port 9090: Connection refused. Then I try to change the --listen_uri parameter's value. I tried several values:
but they can't finger out. Is there any consideration for the --listen_uri parameter value. @deepak-vij Thanks.
@shivramsrivastava @deepak-vij Can anybody answer the question? Or Is there any doc about how to deploy firmament flow scheduler in cluster, like some explanation or a yaml file like this here, which is order to deploy firmament simple scheduler in cluster? Thanks.
https://github.com/camsas/firmament/blob/master/README.md#building-instructions. This is all documented under Developer guide document link as below.
https://github.com/kubernetes-sigs/poseidon/blob/master/docs/devel/README.md
please tell me, thanks.