mezhekov / parallel-ssh

Automatically exported from code.google.com/p/parallel-ssh
Other
0 stars 0 forks source link

IOError: [Errno 4] Interrupted system call #37

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
If I execute a call such as:

---
pssh -i -h /some/file "ps aux | grep zabbix_agent | grep -v grep" | grep 
"FAILURE"

---

What it should do: execute the command and show all the pssh lines with a 
failure.

What it does:

---
Traceback (most recent call last):
  File "/usr/bin/pssh", line 115, in <module>
    do_pssh(hosts, cmdline, opts)
  File "/usr/bin/pssh", line 86, in do_pssh
    statuses = manager.run()
  File "/usr/lib/python2.5/site-packages/psshlib/manager.py", line 75, in run
    self.update_tasks(writer)
  File "/usr/lib/python2.5/site-packages/psshlib/manager.py", line 135, in update_tasks
    keep_running = self.reap_tasks()
  File "/usr/lib/python2.5/site-packages/psshlib/manager.py", line 160, in reap_tasks
    self.finished(task)
  File "/usr/lib/python2.5/site-packages/psshlib/manager.py", line 198, in finished
    task.report(n)
  File "/usr/lib/python2.5/site-packages/psshlib/task.py", line 267, in report
    sys.stdout.flush()
IOError: [Errno 4] Interrupted system call

---

Running the command without a grep or other piped command (such as mail) works 
just fine.

Original issue reported on code.google.com by thasypher on 4 Feb 2011 at 12:53

GoogleCodeExporter commented 9 years ago
This problem should not appear if you use Python 2.6 or later.  I haven't tried 
to fix this on older versions of Python because I haven't been able to easily 
reproduce it, and because all of my machines have a newer version of Python.  
However, you've given a specific reproducing case, so it might be reasonable to 
attack this.

Are you planning on using Python 2.5 for a while, or are you upgrading to later 
versions soon?  Would you be willing to help with testing if I make some 
patches available for you?

Original comment by amcna...@gmail.com on 4 Feb 2011 at 4:18

GoogleCodeExporter commented 9 years ago
I am waiting for a new Debian release, which should have a more recent Python. 
However, from the looks of it might be a while...

I'm willing to help you test the patches, or you could install a Debian 5.0 in 
a VM :-)

Original comment by thasypher on 4 Feb 2011 at 4:22

GoogleCodeExporter commented 9 years ago
Okay, try pulling the branch called issue37, and see if it solves the problem 
(or creates any new problems).  Thanks.

Original comment by amcna...@gmail.com on 4 Feb 2011 at 6:04

GoogleCodeExporter commented 9 years ago
No problem, but I first have to figure out how Git works. Never used it before, 
only Subversion.

Original comment by thasypher on 4 Feb 2011 at 6:30

GoogleCodeExporter commented 9 years ago
You'll never look back. :)  Basically, you'll just do a "git clone 
git://aml.cs.byu.edu/pssh.git" and a "git checkout origin/issue37".

Original comment by amcna...@gmail.com on 4 Feb 2011 at 7:56

GoogleCodeExporter commented 9 years ago
Strange. If I run:
./pssh -t 5 -i -h /var/COMPANYNAME/serverlist "ps aux | grep zabbix_agent | 
grep -v grep"

It works just fine, also with servers where the command failed.
Executes in < 10 seconds on ~400 servers.

But, if I then pipe and grep on "FAILURE" it seems to be hanging. It does not 
crash though, no Interrupted system call either.

Original comment by thasypher on 11 Feb 2011 at 11:36

GoogleCodeExporter commented 9 years ago
Is this on the issue37 branch or on the master branch?

Original comment by amcna...@gmail.com on 11 Feb 2011 at 4:38

GoogleCodeExporter commented 9 years ago
The issue37 branch.

Original comment by thasypher on 12 Feb 2011 at 9:27

GoogleCodeExporter commented 9 years ago
Would you please post a specific traceback that occurs with the issue37 branch? 
 Thanks.

Original comment by amcna...@gmail.com on 12 Feb 2011 at 3:57

GoogleCodeExporter commented 9 years ago
If I do not run it with a " | grep "FAILURE"", it works. No traceback. But 
piping it seems to fail for some strange reason..

Original comment by thasypher on 13 Feb 2011 at 7:03

GoogleCodeExporter commented 9 years ago
Okay, I added a little print statement to the issue37 branch.  Please pull the 
latest version and let me know if you see the error message "rerunning loop 
after EINTR".  I would expect that the statement might be occasionally printed, 
but it's important for me to know if the message is repeatedly printed forever.

Original comment by amcna...@gmail.com on 14 Feb 2011 at 9:09

GoogleCodeExporter commented 9 years ago
Done, no change. 
It also fails with something simple as "exim -bp":

====
  File "/home/frank/pssh-test/pssh/psshlib/manager.py", line 75, in run
    self.update_tasks(writer)
  File "/home/frank/pssh-test/pssh/psshlib/manager.py", line 135, in update_tasks
    keep_running = self.reap_tasks()
  File "/home/frank/pssh-test/pssh/psshlib/manager.py", line 160, in reap_tasks
    self.finished(task)
  File "/home/frank/pssh-test/pssh/psshlib/manager.py", line 198, in finished
    task.report(n)
  File "/home/frank/pssh-test/pssh/psshlib/task.py", line 267, in report
    write_buf_to_stdout(self.outputbuffer)
  File "/home/frank/pssh-test/pssh/psshlib/task.py", line 295, in write_buf_to_stdout
    sys.stdout.write(buf)
IOError: [Errno 4] Interrupted system call
====

Original comment by thasypher on 18 Feb 2011 at 10:49

GoogleCodeExporter commented 9 years ago
Please note that the exim -bp contained non-ascii characters, which probably 
sort-of explains why it went wrong there.

Original comment by thasypher on 18 Feb 2011 at 10:51

GoogleCodeExporter commented 9 years ago
Hmm.  The line numbers in your traceback are different than the line numbers in 
the latest commit in branch issue37.  On mine, line 295 is "flush_stdout()", 
but on yours, line 295 seems to be "sys.stdout.write(buf)".

If you'd like, I can help you track down any problems you're having with your 
repository. If I run "git branch -v", I see:

amcnabb@sage:~/clone/pssh/psshlib :) git branch -v
  issue15 e5d9594 added debug statements for timing
* issue37 1dc6140 added a temporary debug statement
  master  8350c58 bumped the version to 2.2.2
amcnabb@sage:~/clone/pssh/psshlib :)

This shows that I'm on the issue37 branch with the latest commit (1dc6140).

Actually, it was probably my fault for giving incomplete instructions.  Earlier 
I mentioned running "git checkout origin/issue37".  To get any additional 
changes, running "git fetch" will download any changes, and a "git checkout 
origin/issue37" will check out the updated issue37 branch.

Original comment by amcna...@gmail.com on 18 Feb 2011 at 4:17

GoogleCodeExporter commented 9 years ago
Ah!

===
Previous HEAD position was a1795cb... explicitly check for EINTR during flush 
(for Python <= 2.5)
HEAD is now at 1dc6140... added a temporary debug statement
===

Now seeing (while doing git branch -v): 
===
* (no branch) 1dc6140 added a temporary debug statement
  master      8350c58 bumped the version to 2.2.2
===

Retried "exim -bp" on all of my mail servers, resulting in partial information 
and:
====
do_pssh(hosts, cmdline, opts)
  File "./pssh", line 86, in do_pssh
    statuses = manager.run()
  File "/home/frank/pssh-test/pssh/psshlib/manager.py", line 75, in run
    self.update_tasks(writer)
  File "/home/frank/pssh-test/pssh/psshlib/manager.py", line 135, in update_tasks
    keep_running = self.reap_tasks()
  File "/home/frank/pssh-test/pssh/psshlib/manager.py", line 160, in reap_tasks
    self.finished(task)
  File "/home/frank/pssh-test/pssh/psshlib/manager.py", line 198, in finished
    task.report(n)
  File "/home/frank/pssh-test/pssh/psshlib/task.py", line 267, in report
    write_buf_to_stdout(self.outputbuffer)
  File "/home/frank/pssh-test/pssh/psshlib/task.py", line 297, in write_buf_to_stdout
    sys.stdout.write(buf)
IOError: [Errno 4] Interrupted system call
====

Original comment by thasypher on 18 Feb 2011 at 4:23

GoogleCodeExporter commented 9 years ago
Hmm.  I was hoping that sys.stdout.write couldn't raise that exception.  I'll 
have to think about this one.

Original comment by amcna...@gmail.com on 18 Feb 2011 at 6:18

GoogleCodeExporter commented 9 years ago
And, did you think about this issue already?

Original comment by thasypher on 10 Mar 2011 at 4:16

GoogleCodeExporter commented 9 years ago
Yeah, I'm going to have to do this the hard way. :(  Thanks for the reminder.

Original comment by amcna...@gmail.com on 10 Mar 2011 at 4:35