Closed vot4anto closed 3 years ago
In this example the user is root, but is ti possible to use a different user for ssh plugin to can ssh to worker from master?
If it's not a root user, the related files (e.g. publich/private key) will be created in a different directory; that's ok to copy them to user's home.
in which folder the related files are created ? Not in the .ssh folder on the home of the user? I create and set USER test on docker image but for the user test is not possible to do ssh to the worker instead it is possible for user root.
in which folder the related files are created ? Not in the .ssh folder on the home of the user? I create and set USER test on docker image but for the user test is not possible to do ssh to the worker instead it is possible for user root.
Maybe you can configure the ssh plugin like that?
plugins:
ssh: ["--ssh-key-file-path=/home/user/.ssh"]
svc: []
I will try it immediately, thanks for the hint. But is there a documentation of that option of ssh plugins so i can't disturb opening an issue?
I will try it immediately, thanks for the hint. But is there a documentation of that option of ssh plugins so i can't disturb opening an issue?
As far as I know, there are few examples of job plugins at present, I don’t know if I have missed anything.
l can help add some examples of job plugins lately.
in which folder the related files are created ? Not in the .ssh folder on the home of the user? I create and set USER test on docker image but for the user test is not possible to do ssh to the worker instead it is possible for user root.
Maybe you can configure the ssh plugin like that?
plugins: ssh: ["--ssh-key-file-path=/home/user/.ssh"] svc: []
Your suggestion works like a charm for my case. But is there a way to have the IP of the hosts instead that hostname? Or I have to do a reverse resolution at startup of container?
But is there a way to have the IP of the hosts instead that hostname? Or I have to do a reverse resolution at startup of container?
hm... it's hard to know the IP address before pod start; so hostname is a better solution for now. Is there any case that IP is required?
Because the master and worker of our HPC infrastructure use zmq to communicate each other and zmq have same issue with tcp connect that can be easy solved using IP instead of hostname in configuration files, for example: https://stackoverflow.com/questions/21169031/zmq-socket-connect-timeout
I can install on container host or dig to do reverse resolution of hostname, I have to see the better solution for the size of container.
Because the master and worker of our HPC infrastructure use zmq to communicate each other and zmq have same issue with tcp connect that can be easy solved using IP instead of hostname in configuration files, for example: https://stackoverflow.com/questions/21169031/zmq-socket-connect-timeout
That's interesting!
@wpeng102 , @Thor-wl , please help to investigate this scenario :)
It will be fantastic to have help to investigate the use of volcano with our engine. Please contact me in any way you want. Do you attend the European weekly meeting ?
For volcano, it will do the following things: 1) create job in apiserver(create pod hosts file) 2) create pod in apiserver 3) schedule pod to node
Then, kubelet will start pod on node (assign ip for pod). It is hard to no know the pod ip when volcano scheduling pods. Maybe you can add init container for the master and worker pods, which do something like exchange pod ip for each other.
It will be fantastic to have help to investigate the use of volcano with our engine. Please contact me in any way you want. Do you attend the European weekly meeting ?
Yes, Volcano Eruopean weekly meeting will be started and @k82cn or @william-wang will hold the meeting
/assign @wpeng102 @Thor-wl
For volcano, it will do the following things:
- create job in apiserver(create pod hosts file)
- create pod in apiserver
- schedule pod to node
Then, kubelet will start pod on node (assign ip for pod). It is hard to no know the pod ip when volcano scheduling pods. Maybe you can add init container for the master and worker pods, which do something like exchange pod ip for each other.
Yes, I will do that. It is possible also using env variables, something like that to set the IP on master? env:
It will be fantastic to have help to investigate the use of volcano with our engine. Please contact me in any way you want. Do you attend the European weekly meeting ?
Yes, Volcano Eruopean weekly meeting will be started and @k82cn or @william-wang will hold the meeting
Great, I will attend with pleasure
I discovery that there is a misconfiguration on network side of the pods that are created. in /etc/hosts files there are one entry that is different from the hostname that is set for the pods. Follow one example:
/k8s$ kubectl get pods
NAME READY STATUS RESTARTS AGE
master 0/1 Error 0 15h
oqjob-oqmaster-0 1/1 Running 0 57s
/k8s$kubectl exec -it oqjob-oqmaster-0 -- bash
openquake@oqjob-oqmaster-0:~$hostname
oqjob-oqmaster-0
openquake@oqjob-oqmaster-0:~$ cat /etc/hosts
Kubernetes-managed hosts file.
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
fe00::0 ip6-mcastprefix
fe00::1 ip6-allnodes
fe00::2 ip6-allrouters
10.244.1.72 oqjob-oqmaster-0.oqjob.default.svc.cluster.local oqjob-oqmaster-0
openquake@oqjob-oqmaster-0:~$ host oqjob-oqmaster-0
Host oqjob-oqmaster-0 not found: 2(SERVFAIL)
openquake@oqjob-oqmaster-0:~$ cat /etc/volcano/oqmaster.host
oqjob-oqmaster-0.oqjob
openquake@oqjob-oqmaster-0:~$ host oqjob-oqmaster-0.oqjob
oqjob-oqmaster-0.oqjob.default.svc.cluster.local has address 10.244.1.72
openquake@oqjob-oqmaster-0:~$ host oqjob-oqmaster-0
Host oqjob-oqmaster-0 not found: 2(SERVFAIL)
As you can see the hostname of the pod is not unique and not equal to the value of /etc/volcano/oqmaster.host and so the reverse dns is not work as aspect. At the last the yaml of the job:
metadata:
name: oqjob
spec:
minAvailable: 3
schedulerName: volcano
plugins:
ssh: ["--ssh-key-file-path=/home/openquake/.ssh"]
svc: []
env: []
tasks:
- replicas: 1
name: oqmaster
policies:
- event: TaskCompleted
action: CompleteJob
template:
spec:
containers:
- command:
- /bin/sh
- -c
- |
sudo mkdir -p /var/run/sshd; sudo /usr/sbin/sshd ;
image: openquake/engine:exp
imagePullPolicy: Always
name: master
#resources:
# limits:
# memory: "8Gi"
# cpu: "8"
# requests:
# memory: "4Gi"
# cpu: "4"
ports:
workingDir: /home/openquake
restartPolicy: OnFailure
- replicas: 2
name: oqworker
template:
spec:
containers:
- command:
- /bin/sh
- -c
- |
sudo mkdir -p /var/run/sshd; sudo /usr/sbin/sshd -D;
image: openquake/engine:exp
imagePullPolicy: Always
name: worker
workingDir: /home/openquake
restartPolicy: OnFailure
Sorry, do you have some notice about the issue on hostname?
Sorry, do you have some notice about the issue on hostname?
Well, noting but just follow common rules
And which I can say to volcano to set the correct hostname on the pod? I can pass an extra args?
Il giorno gio 21 gen 2021 alle ore 05:01 WuLei notifications@github.com ha scritto:
Sorry, do you have some notice about the issue on hostname?
Well, noting but just as follow common rules
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/volcano-sh/volcano/issues/1246#issuecomment-764224025, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANN2ODVSKONLBUKWLKDCPUTS26RJLANCNFSM4V3N2Z7Q .
-- Antonio Ettorre about.me/antonio.ettorre
And which I can say to volcano to set the correct hostname on the pod? I can pass an extra args? Il giorno gio 21 gen 2021 alle ore 05:01 WuLei notifications@github.com ha scritto: …
Volcano will set the pod.Spec.Hostname
to podName and pod.Spec.Subdomain
to jobName by default, so the address of the pod should be it's FQDN(the output of hostname -f
).
Tips: You can also explicitly specify the pod.Spec.Hostname
and pod.Spec.Subdomain
.
I try to set the name, so i can see if also on hosts files the entries are the rights one.
Il sab 23 gen 2021, 11:04 shinytang6 notifications@github.com ha scritto:
And which I can say to volcano to set the correct hostname on the pod? I can pass an extra args? Il giorno gio 21 gen 2021 alle ore 05:01 WuLei notifications@github.com ha scritto: … <#m-8659299505776582737>
Volcano will set the pod.Spec.Hostname to podName & pod.Spec.Subdomain to jobName by default, the address of the pod should be it's FQDN(the output of hostname -f).
You can also explicitly specify the pod.Spec.Hostname & pod.Spec.Subdomain .
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/volcano-sh/volcano/issues/1246#issuecomment-765899122, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANN2ODSBRHPV65TFKE4CJQLS3KNKZANCNFSM4V3N2Z7Q .
I do as you suggest but the hosts files is still wrong and also the hostnam and sudomain are not as describe. follow the result
oq@oqjob-master-0:/etc/volcano$ more workers.host
oqjob-workers-0.oqjob
oqjob-workers-1.oqjob
oq@oqjob-master-0:/etc/volcano$ more master.host
oqjob-master-0.oqjob
oq@oqjob-master-0:/etc/volcano$ host oqjob-master-0.oqjob
oqjob-master-0.oqjob.default.svc.cluster.local has address 10.244.1.90
oq@oqjob-master-0:/etc/volcano$ cat /etc/hosts
#Kubernetes-managed hosts file.
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
fe00::0 ip6-mcastprefix
fe00::1 ip6-allnodes
fe00::2 ip6-allrouters
10.244.1.90 oqjob-master-0.oqjob.default.svc.cluster.local oqjob-master-0
as you can see here the shortname is oqjob-master-0 and not oqjob-master-0.oqjob
Follow the definition of the job:
apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
name: oqjob
spec:
minAvailable: 3
schedulerName: volcano
plugins:
ssh: ["--ssh-key-file-path=/home/openquake/.ssh"]
svc: []
env: []
tasks:
- replicas: 1
name: master
policies:
- event: TaskCompleted
action: CompleteJob
template:
spec:
containers:
- command:
- /bin/sh
- -c
- |
sleep 3000;
image: openquake/engine:exp
imagePullPolicy: Always
name: master
hostname: master
subdomain: cluster.local
#resources:
# limits:
# memory: "8Gi"
# cpu: "8"
# requests:
# memory: "4Gi"
# cpu: "4"
ports:
- containerPort: 8800
name: oqjob-port
workingDir: /home/openquake
restartPolicy: OnFailure
- replicas: 2
name: workers
template:
spec:
containers:
- command:
- /bin/sh
- -c
- |
sudo mkdir -p /var/run/sshd; sudo /usr/sbin/sshd -D;
image: openquake/engine:exp
imagePullPolicy: Always
name: worker
subdomain: cluster.local
#resources:
# limits:
# memory: "8Gi"
# cpu: "8"
# requests:
# memory: "4Gi"
# cpu: "4"
ports:
ports:
- containerPort: 8800
name: oqjob-port
workingDir: /home/openquake
restartPolicy: OnFailure
@vot4anto a headless seveice will be created when job with plugin svc apply; in container, we get all pod's ip and domain name by nslookup service domain--- "nslookup jobname.default.svc.cluster.local"
Can you also check the /etc/hosts files? In my case the entries here doesn't reflect the dns
Il giorno ven 29 gen 2021 alle ore 03:48 huone1 notifications@github.com ha scritto:
@vot4anto https://github.com/vot4anto a headless seveice will be created when job witch plugin svc apply; in container, we get all pod's ip and domain name by nslookup service domain--- "nslookup .default.svc.cluster.local" [image: image] https://user-images.githubusercontent.com/71266853/106225100-67358100-621f-11eb-8e75-dbce6d3342c6.png
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/volcano-sh/volcano/issues/1246#issuecomment-769537406, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANN2ODSPPVT2AVJLJUNXNK3S4IOWTANCNFSM4V3N2Z7Q .
-- Antonio Ettorre about.me/antonio.ettorre
In my case, use nslookup can parse hostname in /etc/hosts is ok. en... It maybe k8s network module issue?
In my understanding, if use nslookup
collect all workers ip, is is could work for zmp?
I use for testing kind installation of kubernets at version kindest/node:v1.19.4. Can I try with different version?
Il giorno ven 29 gen 2021 alle ore 08:54 Peng Wang notifications@github.com ha scritto:
In my case, use nslookup can parse hostname in /etc/hosts is ok. en... It maybe k8s network module issue?
[image: image] https://user-images.githubusercontent.com/10152842/106245870-3158c280-6248-11eb-9d80-b2a33589c1fd.png
In my understanding, if use nslookup collect all workers ip, is is could work for zmp?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/volcano-sh/volcano/issues/1246#issuecomment-769642362, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANN2ODTK3NMWD2UHNA4EANDS4JSS3ANCNFSM4V3N2Z7Q .
-- Antonio Ettorre about.me/antonio.ettorre
Follow my value:
cat /etc/hosts
127.0.0.1 localhost ::1 localhost ip6-localhost ip6-loopback fe00::0 ip6-localnet fe00::0 ip6-mcastprefix fe00::1 ip6-allnodes fe00::2 ip6-allrouters 10.244.1.92 oqjob-master-0.oqjob.default.svc.cluster.local oqjob-master-0 openquake@oqjob-master-0:~$ nslookup oqjob-master-0 Server: 10.96.0.10 Address: 10.96.0.10#53
** server can't find oqjob-master-0: SERVFAIL
openquake@oqjob-master-0:~$ nslookup oqjob.default.svc.cluster.local Server: 10.96.0.10 Address: 10.96.0.10#53
Name: oqjob.default.svc.cluster.local Address: 10.244.1.94 Name: oqjob.default.svc.cluster.local Address: 10.244.1.92 Name: oqjob.default.svc.cluster.local Address: 10.244.1.93
openquake@oqjob-master-0:~$ nslookup oqjob.default.svc.cluster.local Server: 10.96.0.10 Address: 10.96.0.10#53
Name: oqjob.default.svc.cluster.local Address: 10.244.1.93 Name: oqjob.default.svc.cluster.local Address: 10.244.1.94 Name: oqjob.default.svc.cluster.local Address: 10.244.1.92
Il giorno ven 29 gen 2021 alle ore 09:48 Antonio Ettorre vot4anto@gmail.com ha scritto:
I use for testing kind installation of kubernets at version kindest/node:v1.19.4. Can I try with different version?
Il giorno ven 29 gen 2021 alle ore 08:54 Peng Wang < notifications@github.com> ha scritto:
In my case, use nslookup can parse hostname in /etc/hosts is ok. en... It maybe k8s network module issue?
[image: image] https://user-images.githubusercontent.com/10152842/106245870-3158c280-6248-11eb-9d80-b2a33589c1fd.png
In my understanding, if use nslookup collect all workers ip, is is could work for zmp?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/volcano-sh/volcano/issues/1246#issuecomment-769642362, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANN2ODTK3NMWD2UHNA4EANDS4JSS3ANCNFSM4V3N2Z7Q .
-- Antonio Ettorre about.me/antonio.ettorre
-- Antonio Ettorre about.me/antonio.ettorre
we use kubeadm
to install k8s cluster.
which release of k8s do you use?
Thanks
Il giorno ven 29 gen 2021 alle ore 10:49 Peng Wang notifications@github.com ha scritto:
we use kubeadm to install k8s cluster.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/volcano-sh/volcano/issues/1246#issuecomment-769699805, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANN2ODUT4OEPEIEA3ORMZH3S4KAAXANCNFSM4V3N2Z7Q .
-- Antonio Ettorre about.me/antonio.ettorre
which release of k8s do you use? Thanks Il giorno ven 29 gen 2021 alle ore 10:49 Peng Wang notifications@github.com ha scritto: … we use kubeadm to install k8s cluster. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1246 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANN2ODUT4OEPEIEA3ORMZH3S4KAAXANCNFSM4V3N2Z7Q . -- Antonio Ettorre about.me/antonio.ettorre
kubelet --version
Kubernetes v1.18.2
Hello 👋 Looks like there was no activity on this issue for last 90 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).
Closing for now as there was no activity for last 60 days after marked as stale, let us know if you need this to be reopened! 🤗
Environment:
kubectl version
): Kind installation for testing: kubectl version Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.1", GitCommit:"c4d752765b3bbac2237bf87cf0b1c2e307844666", GitTreeState:"clean", BuildDate:"2020-12-18T12:09:25Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.4", GitCommit:"d360454c9bcd1634cf4cc52d1867af5491dc9c5f", GitTreeState:"clean", BuildDate:"2020-11-18T09:04:15Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}I want to use volcano as scheduler for our engine calculator for earthquakes. The communication of the cluster engine when we use VM or baremetal hosts is made by ssh
I see that there are mpi plugin and also ssh plugin, but unfortunately I can't find any docs on what use these plugins in a deployment yaml. What i need is to understand in which way that plugin works to communicate from master to worker, look the follow example:
In this example the user is root, but is ti possible to use a different user for ssh plugin to can ssh to worker from master? Because on our image container we don't use user root but we need ssh connection from master to worker like open mpi And mpi plugin works in the same way? I find only a PR but no documentation on site volcano.sh or github available
Thanks