Open amituttam opened 10 years ago
Hello! A few questions that might help me understand the problem a bit better.
What's the actual stack trace that you get when RunProcessError
is raised? I'm not sure I understand why the exception is being raised.
You mentioned the process gets stuck in run_command
, but that contains a few different calls. At what point in run_command
is the process getting stuck? More generally, can you get a stack trace of where the process gets stuck?
What happens if you execute a different command (e.g. echo hello
) just before running sudo poweroff
? Does it also hang, or does it complete successfully?
What happens if you only run a different command (e.g. echo hello
) in the loop, rather than using poweroff
and WoL?
Thanks for the quick response. Without the try..catch..else block:
...
with shell:
result = shell.run(cmd)
log.info(result.output)
return result.output
The following exception is raised:
Mon, 31 Mar 2014 10:40:25 - INFO - START: Iteration 0
Mon, 31 Mar 2014 10:40:25 - INFO - Getting systemd-analyze
Mon, 31 Mar 2014 10:40:30 - INFO - Startup finished in 3.825s (kernel) + 16.107s (userspace) = 19.932s
Mon, 31 Mar 2014 10:40:30 - INFO - Powering off
Traceback (most recent call last):
File "test.py", line 79, in <module>
run_command(["sudo", "poweroff"], host)
File "test.py", line 43, in run_command
result = shell.run(cmd)
File "/home/amit/projects/zxi-boot-test/local/lib/python2.7/site-packages/spur/ssh.py", line 71, in run
return self.spawn(*args, **kwargs).wait_for_result()
File "/home/amit/projects/zxi-boot-test/local/lib/python2.7/site-packages/spur/ssh.py", line 280, in wait_for_result
self._result = self._generate_result()
File "/home/amit/projects/zxi-boot-test/local/lib/python2.7/site-packages/spur/ssh.py", line 292, in _generate_result
stderr_output
File "/home/amit/projects/zxi-boot-test/local/lib/python2.7/site-packages/spur/results.py", line 9, in result
raise result.to_error()
spur.results.RunProcessError: return code: -1
output:
stderr output:
I do have a call before powering off (as shown in the above log output) and this basically just calls systemd-analyze and gets the boot time. This command always succeeds.
If you're expecting an error, then using allow_error=True
in the call to run
when calling sudo poweroff
might be a more precise means of achieving the same thing. The return code of -1 suggests that the SSH channel is closed without the server providing a return code.
Given that turning the machine off is likely to break the connection, I'd suggest you'd be better off using spawn
rather than run
.
If you could provide a stack trace of where the command gets stuck (even if using spawn
rather than run
fixes your problem), that would be great.
Thanks. I will try those suggestions out.
Also, how do I provide the stack trace when the command is stuck? Do I attach to the process using pdb/gdb?
Yup, pdb/gdb is probably your best bet.
Process is currently hung, using gdb to attach to the process:
Note: this is with shell.run(cmd, allow_error=True)
$ sudo gdb python 15357
Reading symbols from /usr/bin/python2.7...Reading symbols from /usr/lib/debug/usr/bin/python2.7...done.
done.
Attaching to program: /usr/bin/python, process 15357
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from /usr/lib/debug/lib/x86_64-linux-gnu/ld-2.18.so...done.
done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fff30d24000
0x00007fe7a2506000 in ?? ()
(gdb) bt
#0 0x00007fe7a2506000 in ?? ()
#1 0x0000000001c30b20 in ?? ()
#2 0x000000000054590e in type_traverse.25017 (type=0x14fd0a0, visit=0x1, arg=0x7fe797254170) at ../Objects/typeobject.c:2712
#3 0x0000000000000003 in ?? ()
#4 0x00007fe788003aa0 in ?? ()
#5 0x00000000014fd0a0 in ?? ()
#6 0x00007fe7a28bc050 in ?? ()
#7 0x00007fe7a2892b36 in ?? ()
#8 0x00007fe7971f3128 in ?? ()
#9 0x000000000051f18d in BaseException_init.12117 (self=0x7fe79721f530, args=<optimized out>, kwds=<optimized out>) at ../Objects/exceptions.c:65
#10 0x00007fe7a2892ad4 in ?? ()
#11 0x00007fe788003a58 in ?? ()
#12 0x00000000014fd0a0 in ?? ()
#13 0x00007fe7a2752558 in ?? ()
#14 0x0000000000000000 in ?? ()
(gdb) py-bt
(gdb)
The python debug symbols are there but not sure what the system supplied DSO at 0x7fff30d24000 means.
I will leave the process at this state, so if you need me to run any other debugging commands I can run them.
Hmm, not sure why py-bt
isn't giving any useful output. If you can work out why that isn't giving any output then that would be great, but I'm afraid I probably can't provide much help beyond what's on the Python wiki and whatnot.
Oh, and I forgot to ask: did using spawn
instead of run
help at all?
With allow_error=True, I did not need the try catch block and the test ran for 57 iterations before it hung. With spawn, I was able to run the test for 76 iterations before it hung.
On Mon, Mar 31, 2014 at 2:49 PM, Michael Williamson < notifications@github.com> wrote:
Oh, and I forgot to ask: did using spawn instead of run help at all?
Reply to this email directly or view it on GitHubhttps://github.com/mwilliamson/spur.py/issues/14#issuecomment-39146602 .
Did you have any luck finding out where the program was hanging?
Haven't tried it out since our last conversation. However, working on in it the next couple of days so hopefully I will have better luck.
On Fri, Apr 18, 2014 at 6:56 AM, Michael Williamson notifications@github.com wrote:
Did you have any luck finding out where the program was hanging?
— Reply to this email directly or view it on GitHub.
Running gdb on a virtualenv script doesn't give much info. So ran it outside virtualenv and here is the backtrace:
Looks like it is locking.
(gdb) py-bt
waiter.acquire()
self.__block.wait()
self._thread.join()
return [handler.wait() for handler in self._handlers]
output, stderr_output = self._io.wait()
self._result = self._generate_result()
Python Exception <type 'exceptions.IOError'> (2, 'No such file or directory', 'test.py'): Error occurred in Python command: (2, 'No such file or directory', 'test.py') (gdb) info threads Id Target Id Frame 5 Thread 0x7f761dfa7700 (LWP 10807) "python" 0x00007f76287a8c33 in select () at ../sysdeps/unix/syscall-template.S:81 4 Thread 0x7f761ce78700 (LWP 8064) "python" 0x00007f76287a472d in poll () at ../sysdeps/unix/syscall-template.S:81 3 Thread 0x7f7617fff700 (LWP 8088) "python" sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85 2 Thread 0x7f76177fe700 (LWP 8089) "python" sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
339 waiter.acquire() 340 if __debug**: 341 self._note("%s.wait(): got it", self) 342 else: 343 # Balancing act: We can't afford a pure busy loop, so we 344 # have to sleep; but if we sleep the whole timeout time,
On Fri, Apr 18, 2014 at 06:56:41AM -0700, Michael Williamson wrote:
Did you have any luck finding out where the program was hanging?
Reply to this email directly or view it on GitHub: https://github.com/mwilliamson/spur.py/issues/14#issuecomment-40810139
Could you get a backtrace for the other threads as well?
It looks like it's stuck waiting for the process to terminate, but I'm not quite sure why it's doing that if you're using spawn
rather than run
, unless you're also calling wait_for_result()
on the returned process? It seems to me that you just want to use spawn
, and ignore the result.
Yes, I'm calling wait_for_result(). I have the following function:
def run_command(cmd, host): shell = spur.SshShell(hostname=host, username="test", password="pass", missing_host_key=spur.ssh.MissingHostKey.accept)
with shell:
proc = shell.spawn(cmd, allow_error=True)
result = proc.wait_for_result()
log.info(result.output)
return result.output
I call run_command with various commands and then finally with a "['sudo', 'poweroff']". The script runs a loop for a 100 times, and sometimes calling poweroff works until about the 60th or 70th iteration. Thus, it seems like the deadlock doesn't happen all the time.
However, I can create a different function just for the poweroff without waiting for the result if that is the correct way to do it.
On Wed, Apr 30, 2014 at 03:14:16AM -0700, Michael Williamson wrote:
Could you get a backtrace for the other threads as well?
It looks like it's stuck waiting for the process to terminate, but I'm not quite sure why it's doing that if you're using
spawn
rather thanrun
, unless you're also callingwait_for_result()
on the returned process? It seems to me that you just want to usespawn
, and ignore the result.
Reply to this email directly or view it on GitHub: https://github.com/mwilliamson/spur.py/issues/14#issuecomment-41780772
However, I can create a different function just for the poweroff without waiting for the result if that is the correct way to do it.
That would seem to me to be the most appropriate method. Using spawn
and wait_for_result
doesn't get you anywhere over using run
since that's exactly how run
is implemented.
Backtraces for the other threads would still be handy in working out what the issue though.
Backtraces for the other threads would still be handy in working out what the issue though.
(gdb) info threads Id Target Id Frame 5 Thread 0x7f4f592d4700 (LWP 7760) "python" 0x00007f4f63ad5c33 in select () at ../sysdeps/unix/syscall-template.S:81 4 Thread 0x7f4f57964700 (LWP 30772) "python" 0x00007f4f63ad172d in poll () at ../sysdeps/unix/syscall-template.S:81 3 Thread 0x7f4f581a5700 (LWP 30779) "python" sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85 2 Thread 0x7f4f57163700 (LWP 30780) "python" sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
Is this the backtrace you were talking about? I will leave the process in this state so if you want more backtraces, I can get them to you.
py-bt
shows the backtrace for the current thread, so to see the backtraces for all threads, you can run t a a py-bt
(short for thread apply all py-bt
)
Output of 't a a py-bt':
Thanks! I'll take a look when I get a chance, but hopefully just using spawn
will fix your specific issue.
Thanks will do that. Also, you might want to save the bt, i think the paste@debian expires in 3 days.
On Wed, Apr 30, 2014 at 10:32 AM, Michael Williamson < notifications@github.com> wrote:
Thanks! I'll take a look when I get a chance, but hopefully just using spawn will fix your specific issue.
Reply to this email directly or view it on GitHubhttps://github.com/mwilliamson/spur.py/issues/14#issuecomment-41825113 .
Thanks for letting me know, I've slapped it in a gist:
Thanks!
On Wed, Apr 30, 2014 at 10:37 AM, Michael Williamson < notifications@github.com> wrote:
Thanks for letting me know, I've slapped it in a gist:
https://gist.github.com/mwilliamson/2cdf38bee1fa68f0800e
Reply to this email directly or view it on GitHubhttps://github.com/mwilliamson/spur.py/issues/14#issuecomment-41825647 .
I use the following code to send a poweroff command to a remote machine. I catch the resulting exception otherwise I get RunProcessError. I am trying to test poweroff + wake on lan on a unit repeatedly so I run this in a loop about 100-200 times. I usually the process hang around the 30th iteration: