Closed adamdunkley closed 10 years ago
.open_sftp()
looks like a file sync problem caused by not having permissions, read-only file system or similar things.
What are the fabric and paramiko versions?
Something else you could try is fix node:[REDACTED HOST] ssh:"cat /etc/issue"
If it works, it means login works, and it really is transferring/saving files that is failing.
Seems to go ok:
Adams-MacBook-Pro-2:chef adam$ fix node:[REDACTED HOST] ssh:"cat /etc/issue"
Executing the command 'cat /etc/issue' on node [REDACTED HOST]...
Ubuntu 12.04.4 LTS \n \l
Done.
Disconnecting from [REDACTED HOST]... done.
As for the fabric/paramiko versions:
Adams-MacBook-Pro-2:chef adam$ pip list
argparse (1.1)
distribute (0.6.36)
Django (1.5.5)
ecdsa (0.11)
Fabric (1.8.3)
git-remote-helpers (0.1.0)
littlechef (1.6.1)
paramiko (1.12.3)
pip (1.5.4)
pycrypto (2.6.1)
setuptools (0.6c11)
simplejson (3.3.0)
vboxapi (1.0)
wsgiref (0.1.2)
So, permissions issues. I'll look at some env stuff with the ssh command …
Hmm, sudo definitely seems to be working so I can't imagine what it might be unless the sudo stuff within littlechef isn't actually happening:
Adams-MacBook-Pro-2:chef adam$ fix node:[REDACTED HOST] ssh:"touch /test-34567876543"
Executing the command 'touch /test-34567876543' on node [REDACTED HOST]...
touch: cannot touch `/test-34567876543': Permission denied
Done.
Disconnecting from [REDACTED HOST]... done.
Adams-MacBook-Pro-2:chef adam$ fix node:[REDACTED HOST] ssh:"sudo touch /test-34567876543"
Executing the command 'sudo touch /test-34567876543' on node [REDACTED HOST]...
Done.
Disconnecting from [REDACTED HOST]... done.
And with sudo? fix node:[REDACTED HOST] ssh:"sudo cat /etc/issue"
Looking closer at the stack trace, it happens when updating /etc/chef/solo.rb
. Can you have a look at the directory and its permissions?
In any case we can add better error handling there:
https://github.com/tobami/littlechef/blob/1.6.1/littlechef/solo.py#L105
I'll mark this as a bug.
adam@[REDACTED HOST] [11:45:28] [~]
-> % ll /etc/chef
total 8.0K
-rw-r----- 1 root root 0 Mar 21 14:53 client.rb
-rw--w---- 1 root root 135 Mar 21 15:51 node.json
-r-------- 1 root root 234 Mar 21 15:51 solo.rb
That'll be it. Thanks a lot for helping me debug this, would you like me to raise a separate issue about the error handling?
Ok, so I am no longer convinced this is permission related. I am still getting the problem with sensible permissions in that folder (and the parent directory). Also the error coming back from paramiko doesn't really marry with a permission error. Surely it would have more sensible error messages than "Channel closed." if it was some error with putting the information. It also doesn't really look like it gets very far in to the transport …
Turns out the paramiko error was actually quite accurate:
Apr 5 12:44:04 [REDACTED HOST] sshd[6583]: Accepted publickey for adam from [REDACTED IP] port 53366 ssh2
Apr 5 12:44:04 [REDACTED HOST] sshd[6583]: pam_unix(sshd:session): session opened for user adam by (uid=0)
Apr 5 12:44:05 [REDACTED HOST] sudo: adam : TTY=pts/1 ; PWD=/home/adam ; USER=root ; COMMAND=/bin/bash -l -c chown -R adam /tmp/chef-solo
Apr 5 12:44:05 [REDACTED HOST] sudo: pam_unix(sudo:session): session opened for user root by adam(uid=1000)
Apr 5 12:44:05 [REDACTED HOST] sudo: pam_unix(sudo:session): session closed for user root
Apr 5 12:44:05 [REDACTED HOST] sshd[6743]: subsystem request for sftp by user adam
Apr 5 12:44:05 [REDACTED HOST] sshd[6743]: subsystem request for sftp failed, subsystem not found
Apr 5 12:44:05 [REDACTED HOST] sshd[6583]: pam_unix(sshd:session): session closed for user adam
I can definitely fix this now I know what it is … sorry to have taken up your time!
I am not the only person who has had this problem: https://gist.github.com/bradmontgomery/3954511
So how did you solve it in the end? And what do you think we can do on LittleChef's side so that users see a more meaningful message that helps them know what kind of error it is?
So you need to have a SFTP subsystem enabled which, annoyingly, the opscode openssh doesn't enable by default, so you need to add the following attribute (or update it manually if you've not yet cooked and there's no subsystem enabled):
node.default['openssh']['server']['subsystem'] = 'sftp /usr/lib/openssh/sftp-server';
Which only solves it for ubuntu.
I am not sure how this can be made clearer what is happening in littlechef because the problem is quite a low level SSH problem that really is quite unrelated to littlechef. I think paramiko could do a better job at describing the circumstance that seems to be going wrong, but it could actually be that it cannot get access to why exactly its connection drops. Maybe just having this issue and it being searchable will be enough that if someone gets paramiko.ssh_exception.SSHException: Channel closed.
they'll find their way here?
Also, just want to give some props to @bradmontgomery. Wouldn't have solved it as quickly as I did without his gist being on google. :+1:
Great, thanks. I created an issue to implement the improved error handling.
:thumbsup: Happy that gist helped! :smile:
I've been trying to wrack my brains over this one for a while, it's only just started happening. It seems fine when doing every other action except for synchronising the cookbooks etc.
Tried it with both a private key and a password that should both be right and the user has sudo rights.
Anything else I can provide for the problem? (host is Ubuntu 12.04)