Closed mturilli closed 9 years ago
Mark, do you have any idea in what direction to investigate? Looks like ssh level problem though, doesn't it (trestles has ssh as non-mpi launch method)?
Similar to: CUs are failing on Trestles #124 ? I have posted some instructions there, hope these will help...
Thanks, Antons
Antons, thanks! That looks indeed very similar. Matteo, would you please follow that procedure to set up another keypair?
Thank you Antons.
I believe my setup is already consistent with the instructions offered by Antons. Here some details. Naming seems fine:
[mturilli@trestles-login1 .ssh]$ ls
authorized_keys id_rsa id_rsa.pub known_hosts
Permissions are fine.
$ ls -al
total 24
drwx------ 2 mturilli unc100 4096 Aug 18 09:09 .
drwx------ 10 mturilli unc100 4096 Jan 14 10:22 ..
-rw------- 1 mturilli unc100 821 Feb 6 2014 authorized_keys
-rw------- 1 mturilli unc100 1679 Feb 6 2014 id_rsa
-rw-r--r-- 1 mturilli unc100 415 Feb 6 2014 id_rsa.pub
-rw-r--r-- 1 mturilli unc100 640 Jan 24 05:15 known_hosts
This is the local pub key:
[mturilli@trestles-login1 .ssh]$ cat id_rsa.pub
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC2RPMsXCUpBL72IpoXhkrTq3jUvPmENiNqQlhb4v5+a72OYWj4Lz6ecfknSDNvOPpMlQ5iokC1ftIKhLEQzWouE2vfpgAy29m9p8KmCY/NsFXPv9zYFU7Cfe7YlKB3S0PrV3NoxmK0PsEZlqispFXGRWzY4d7MBZUSAHH4rYpsCNjleddPz5yieG/fR1oGaubp3waaDc0SZcRkEc8+WY/YDwqOssD8xlZZdWnZoPWBEAtr6sUlKFs+NUR61CZfw5lAkzlEZIjHTf7wuHV+fiH0U4l8DJSEODYogk7TwsEZ/2hafBTidKZPfB1GOPfJb+Dhew7ZHVxFczipOzCoZGKn mturilli@trestles-login1.sdsc.edu
And this is indeed the key in the authorized keys file:
$ cat authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC2RPMsXCUpBL72IpoXhkrTq3jUvPmENiNqQlhb4v5+a72OYWj4Lz6ecfknSDNvOPpMlQ5iokC1ftIKhLEQzWouE2vfpgAy29m9p8KmCY/NsFXPv9zYFU7Cfe7YlKB3S0PrV3NoxmK0PsEZlqispFXGRWzY4d7MBZUSAHH4rYpsCNjleddPz5yieG/fR1oGaubp3waaDc0SZcRkEc8+WY/YDwqOssD8xlZZdWnZoPWBEAtr6sUlKFs+NUR61CZfw5lAkzlEZIjHTf7wuHV+fiH0U4l8DJSEODYogk7TwsEZ/2hafBTidKZPfB1GOPfJb+Dhew7ZHVxFczipOzCoZGKn mturilli@trestles-login1.sdsc.edu
ssh-rsa
[...]
But:
[mturilli@trestles-login1 .ssh]$ ssh 10.1.254.212
Password:
Even if I am not sure it should work like that - probably it should not.
Have you tried with mturilli@10.1.254.212 ? If you provide the password does it login? You also might wait for "some time" for keys to get propogated I guess. What is the output of: ssh -vv mturilli@10.1.254.212?
Thanks, Antons
The keys were already there so I would exclude an issue with propagation. I have no password on trestles, only my own key that is different from the one pasted in my previous message, that was set up by trestles 'itself'. Note that CUs do not always fail, most of the time at the moment.
On Sat, Jan 24, 2015 at 9:59 PM, Antons notifications@github.com wrote:
Have you tried with mturilli@10.1.254.212 ? If you provide the password does it login? You also might wait for "some time" for keys to get propogated I guess. Not sure that else it might be.
Thanks, Antons
— Reply to this email directly or view it on GitHub https://github.com/radical-cybertools/radical.pilot/issues/478#issuecomment-71336988 .
Dr Matteo Turilli Department of Electrical and Computer Engineering Rutgers University
Thank you for your reply Matteo. Interesting.
CUs do not always fail, most of the time at the moment.
I would guess that CU's fail only for nodes where keys are not configured and not for all nodes. Is there a way to make RP messages more informative, e.g. from where to where the ssh attempt was made?
Thanks
Is there a way to make RP messages more informative, e.g. from where to where the ssh attempt was made?
That will be hidden somewhere in the logs -- but the node the agent is running on, and the nodes used for the CUs, will change every time, so I am not sure how digging that information out would help us? But I agree, that is interesting that ssh setup seems ok for some nodes, but not for others. Lets wait until Mark chimes in, otherwise I'd probably suggest to open a trestles ticket...
It is now working, I am going to close this ticket considering it a temporary failure of trestles, not of radical.pilot.
All or some of the CUs submitted to pilots on trestles fail. Pilot works properly and no errors are reported in the pilot LOG* files. The STDERR of each failing unit contains only the following lines:
STDOUT are empty.
Here a summary of the pilot description:
ssh on localhost works as expected: