perfsonar / project

The perfSONAR project's primary wiki and issue tracker.
Apache License 2.0
53 stars 10 forks source link

Investigate BWCTL handling of IPERF processes #733

Closed arlake228 closed 9 years ago

arlake228 commented 9 years ago

Original issue 734 created by arlake228 on 2013-06-13T12:40:49.000Z:

From a user:

I've noticed varying amounts of iperf processes in a sleep/interrupt state on my servers.

  1. Is this a cause for concern? I suspect it's because tests didn't complete and the processes aren't terminated gracefully...
  2. Can I safely kill them manually?
  3. Any tips for preventing this in future or should I just "clean up" periodically?

PS I'm using pS-PS 3.2.2 version.

Thanks a lot!

Regards,

Roderick

bwctl 11267 0.0 0.0 33952 972 ? Sl Mar15 0:05 iperf -B 155.232.40.2 -s -f b -m -p 5086 -w 35651584 -t 20 bwctl 11803 0.0 0.0 33952 964 ? Sl May01 0:00 iperf -B 155.232.40.2 -s -f b -m -p 5098 -w 35651584 -t 20 bwctl 11982 0.0 0.0 33952 972 ? Sl May18 0:03 iperf -B 155.232.40.2 -s -f b -m -p 5066 -w 35651584 -t 20 bwctl 11998 0.0 0.0 33952 964 ? Sl Apr09 0:07 iperf -B 155.232.40.2 -s -f b -m -p 5037 -w 35651584 -t 20 bwctl 12278 0.0 0.0 33952 968 ? Sl Jun03 0:02 iperf -B 155.232.40.2 -s -f b -m -p 5036 -w 35651584 -t 20 bwctl 12749 0.0 0.0 33952 972 ? Sl Jun08 0:00 iperf -B 155.232.40.2 -s -f b -m -p 5089 -w 35651584 -t 20 bwctl 12782 0.0 0.0 33952 972 ? Sl Apr03 0:04 iperf -B 155.232.40.2 -s -f b -m -p 5025 -w 35651584 -t 20 bwctl 13052 0.0 0.0 33952 976 ? Sl Apr11 0:05 iperf -B 155.232.40.2 -s -f b -m -p 5059 -w 35651584 -t 20 bwctl 15450 0.0 0.0 33952 1232 ? Sl May21 0:04 iperf -B 155.232.40.2 -s -f b -m -p 5032 -w 35651584 -t 20 bwctl 15499 0.0 0.0 33952 968 ? Sl Mar10 0:05 iperf -B 155.232.40.2 -s -f b -m -p 5071 -w 35651584 -t 20 bwctl 16516 0.0 0.0 33952 968 ? Sl Jun06 0:06 iperf -B 155.232.40.2 -s -f b -m -p 5020 -w 35651584 -t 20 bwctl 16980 0.0 0.0 33952 968 ? Sl Apr25 0:01 iperf -B 155.232.40.2 -s -f b -m -p 5020 -w 35651584 -t 20 bwctl 17122 0.0 0.0 33952 976 ? Sl May26 0:04 iperf -B 155.232.40.2 -s -f b -m -p 5075 -w 35651584 -t 20 bwctl 17315 0.0 0.0 33952 964 ? Sl Apr22 0:05 iperf -B 155.232.40.2 -s -f b -m -p 5004 -w 35651584 -t 20 bwctl 17413 0.0 0.0 33952 968 ? Sl Mar09 0:06 iperf -B 155.232.40.2 -s -f b -m -p 5024 -w 35651584 -t 20 bwctl 18284 0.0 0.0 33952 1220 ? Sl Apr07 0:03 iperf -B 155.232.40.2 -s -f b -m -p 5067 -w 35651584 -t 20 bwctl 18768 0.0 0.0 33952 968 ? Sl May30 0:05 iperf -B 155.232.40.2 -s -f b -m -p 5015 -w 35651584 -t 20 bwctl 19219 0.0 0.0 33952 1224 ? Sl May24 0:03 iperf -B 155.232.40.2 -s -f b -m -p 5034 -w 35651584 -t 20 bwctl 19253 0.0 0.0 33952 1224 ? Sl Mar18 0:06 iperf -B 155.232.40.2 -s -f b -m -p 5034 -w 35651584 -t 20 bwctl 21006 0.0 0.0 33952 972 ? Sl Jun11 0:05 iperf -B 155.232.40.2 -s -f b -m -p 5065 -w 35651584 -t 20 bwctl 21108 0.0 0.0 33952 972 ? Sl Jun09 0:07 iperf -B 155.232.40.2 -s -f b -m -p 5013 -w 35651584 -t 20 bwctl 22203 0.0 0.0 33952 968 ? Sl Mar27 0:03 iperf -B 155.232.40.2 -s -f b -m -p 5078 -w 35651584 -t 20 bwctl 22826 0.0 0.0 33952 972 ? Sl May07 0:04 iperf -B 155.232.40.2 -s -f b -m -p 5086 -w 35651584 -t 20 bwctl 22850 0.0 0.0 33952 964 ? Sl Apr30 0:04 iperf -B 155.232.40.2 -s -f b -m -p 5033 -w 35651584 -t 20 bwctl 23515 0.0 0.0 33952 1228 ? Sl May10 0:08 iperf -B 155.232.40.2 -s -f b -m -p 5033 -w 35651584 -t 20 bwctl 23757 0.0 0.0 33952 964 ? Sl May23 0:05 iperf -B 155.232.40.2 -s -f b -m -p 5048 -w 35651584 -t 20 bwctl 23791 0.0 0.0 33952 1232 ? Sl Apr05 0:05 iperf -B 155.232.40.2 -s -f b -m -p 5071 -w 35651584 -t 20 bwctl 24848 0.0 0.0 33952 972 ? Sl Apr10 0:05 iperf -B 155.232.40.2 -s -f b -m -p 5086 -w 35651584 -t 20 bwctl 25112 0.0 0.0 33952 1220 ? Sl Apr15 0:05 iperf -B 155.232.40.2 -s -f b -m -p 5015 -w 35651584 -t 20 bwctl 25531 0.0 0.0 33952 968 ? Sl May29 0:03 iperf -B 155.232.40.2 -s -f b -m -p 5057 -w 35651584 -t 20 bwctl 26490 0.0 0.0 33952 968 ? Sl Apr28 0:06 iperf -B 155.232.40.2 -s -f b -m -p 5036 -w 35651584 -t 20 bwctl 27197 0.0 0.0 33952 968 ? Sl May19 0:05 iperf -B 155.232.40.2 -s -f b -m -p 5007 -w 35651584 -t 20 bwctl 27478 0.0 0.0 33952 968 ? Sl Apr24 0:05 iperf -B 155.232.40.2 -s -f b -m -p 5047 -w 35651584 -t 20 bwctl 27848 0.0 0.0 33952 972 ? Sl Mar26 0:06 iperf -B 155.232.40.2 -s -f b -m -p 5013 -w 35651584 -t 20 bwctl 27850 0.0 0.0 33952 968 ? Sl Mar12 0:04 iperf -B 155.232.40.2 -s -f b -m -p 5075 -w 35651584 -t 20 bwctl 28072 0.0 0.0 33952 972 ? Sl May16 0:02 iperf -B 155.232.40.2 -s -f b -m -p 5023 -w 35651584 -t 20 bwctl 28344 0.0 0.0 33952 964 ? Sl May21 0:03 iperf -B 155.232.40.2 -s -f b -m -p 5070 -w 35651584 -t 20

And another:

I've seen left over iperfs from bwctl on our perfSONARs. It is a cause for concern because the hangers-on seem to block ongoing testing, at least in our case. kill -9 works and appears to be sufficient to clean up and allow testing to resume.

This is purely anecdotal (perhaps the pS team can provide further insight): if you have a lot of bwctl tests configured with that machine, you may want to check the test frequency % reported on the test configuration page. bwctl/iperfs on one of the XSEDE perfSONARs were hanging much more frequently than on the other 7 pSs. Not all the XSEDE pSs run identical testing and that machine happened to be running with more frequent testing, reported at 13% vs. 6-7% of the time on the other pSs. Reducing the frequency of testing to 9% seems to have eliminated the hangs.

The original user noted:

Kill -9 worked. It doesn't seem like the rogue iperfs caused any real problems though - tests still carried on - perhaps because I had enough free ports. I'm only running test 1% of the time - what I thought was a bit strange though is that the tests are scheduled for every 4 hrs but the results I get on the graphs are every 8 hours. I will monitor the situation now and see what happens.

Talking with Aaron, he looked at BWCTL and noticed that it does do a TERM and KILL, but further investigation is needed.

arlake228 commented 9 years ago

Comment #1 originally posted by arlake228 on 2013-10-15T18:57:48.000Z:

Aaron has it fixed in trunk. Needs more testing.

arlake228 commented 9 years ago

Comment #2 originally posted by arlake228 on 2014-05-06T17:48:59.000Z:

<empty>