zhuravskiy / wl500g

Automatically exported from code.google.com/p/wl500g
0 stars 0 forks source link

miniupnpd CPU load = 100% #352

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
Unfortanutely, the problem occures periodically for unknown reason. Sometimes I 
notice, that internet works slow. I check, and find that at that time CPU load 
is constantly 100% with 50% system, while ordinary it is near 1-3%.
Only device restart helps.

What version of the product are you using?
Asus RT-N16
1.9.2.7-rtn-r4330, the problem also occured at the builds 3497, 3702 and 4051, 
that I installed.

Please provide any additional information below.
I use USB modem and USB printer.

Yes, I know, that information is not enough. Next time I'll try to make 
screenshot and collect "top" process information.

Original issue reported on code.google.com by Konst...@gmail.com on 12 Sep 2012 at 3:24

GoogleCodeExporter commented 8 years ago
Please never create "something strange" issues, we can't help in such situation.

It might be even simple network load from your PC workstation(torrent or 
viruses) or DDOS attack.

Original comment by lly.dev on 13 Sep 2012 at 6:24

GoogleCodeExporter commented 8 years ago
Now I could provide some additional information about issue.

Network load:
WAN = 8 Kbps In(avg), 2 Kbps Out(avg)
Wireless = 22 Kbps In(avg), 22 Kbps Out(avg)
LAN = 21 Kbps In(avg), 19 Kbps Out(avg) (that is strange: I have no cables 
connected)

"top" command information:
Mem: 23524K used, 103236K free, 0K shrd, 2712K buff, 7208K cached
CPU:  58% usr  40% sys   0% nic   0% idle   0% io   0% irq   1% sirq
Load average: 1.04 1.10 1.09 2/31 5244
  PID  PPID USER     STAT   VSZ %VSZ %CPU COMMAND
  223     1 admin    S      960   1% 100% miniupnpd
  329     1 admin    S     1520   1%   0% watchdog
 5227  5150 admin    S     1520   1%   0% pppd call wan0_pppd
    1     0 admin    S     1508   1%   0% /sbin/init
 5150     1 admin    S     1400   1%   0% {dial} /bin/bash /usr/ppp/dial wan0_
 5244  5243 admin    R     1384   1%   0% top
 5243   217 admin    S     1380   1%   0% /bin/sh -c top
  205     1 admin    S     1376   1%   0% syslogd -m 0 -O /tmp/syslog.log -S -D
  208     1 admin    S     1376   1%   0% klogd
  215     1 admin    S     1376   1%   0% telnetd
  217     1 admin    S     1156   1%   0% httpd vlan2
  211     1 admin    S     1156   1%   0% nas
  228     1 admin    S     1020   1%   0% lld2d br0 eth1
  221     1 nobody   S     1004   1%   0% dnsmasq
  209     1 admin    S      964   1%   0% eapd
  292     1 admin    S      852   1%   0% lpd
  294     1 admin    S      772   1%   0% {p910nd} p9100d -f /dev/lp0 0
    4     2 admin    SW<      0   0%   0% [events/0]
    3     2 admin    SW<      0   0%   0% [ksoftirqd/0]
  238     2 admin    SW<      0   0%   0% [khubd]

Also find a screenshot attached

Original comment by Konst...@gmail.com on 19 Sep 2012 at 7:16

GoogleCodeExporter commented 8 years ago
So, it is miniupnpd bug. Are any error messages in syslog log?

Can you try latest night build r4606 from
http://asus.vectormm.net/rtn/
http://wpte.kicks-ass.net/downloads/Oleg Firmware/Nightlys/1.9.2.7-rtn/
?

Are you able to use strace and/or gdb tools to debug application?

We will update miniupnpd from upstream, hope it will fix problem.

Original comment by lly.dev on 20 Sep 2012 at 7:33

GoogleCodeExporter commented 8 years ago
Here is a system log (the same buld as before). I have found CPU load at 17:50, 
do not know, how long it runs.
If you explain how to use strace and/or gdb tools at my router - I'll try this 
next time.
And now I'm going to update my build to r4606

Original comment by Konst...@gmail.com on 30 Sep 2012 at 2:11

Attachments:

GoogleCodeExporter commented 8 years ago
First of all, please use r4645 release.

About use of strace:
0) install strace (from Entware, Optware, self-compiled)
1) you have to find miniupnpd pid from output of "top" or "ps"
2) execute "strace -ttv -p <pid> -o /tmp/mini.log
3) stop strace by interrupt it by ctrl-C after few minutes

Upload zipped/gzipped logfile /tmp/mini.log to issue

Original comment by lly.dev on 1 Oct 2012 at 7:24

GoogleCodeExporter commented 8 years ago
Just got the same thing with r4643 release.
Unfortanutely, I do not understand, how to install strace at router. Could you 
explain more? I use unix very rear.

Attached are syslog and diagnostic info. 
The 100% CPU I have found at 01:45, at 0:30 CPU load was ~2%

Original comment by Konst...@gmail.com on 4 Oct 2012 at 10:07

Attachments:

GoogleCodeExporter commented 8 years ago
PS. My computer runs WinXP.

Original comment by Konst...@gmail.com on 4 Oct 2012 at 10:09

GoogleCodeExporter commented 8 years ago
You have to install strace from Entware project, you can read more at 
wl500g.info forum.

btw., you have hardware problems with your Huawei modem. Probably - 
insufficient power or broken(too long) usb cable.

Original comment by lly.dev on 5 Oct 2012 at 6:25

GoogleCodeExporter commented 8 years ago
@ Konst

here is strace I've compiled for you:

http://www.mediafire.com/?2fgrvpej3x1g9la

it might not work, not sure, but give it a try

you need to download it to your computer and SCP to the router over SSH with 
WinSCP somewhere (e.g. /usr/local/)

chmod +x /usr/local/strace

and after that you can use it as lly.dev said.

Original comment by spameden on 8 Oct 2012 at 9:46

GoogleCodeExporter commented 8 years ago
A user reported the same problem on r4667. The system log is heavily flooded 
with
"Oct  9 12:12:27 miniupnpd[221]: accept(http): Too many open files" (one 
message per second).

Original comment by ad...@vampik.ru on 9 Oct 2012 at 3:33

GoogleCodeExporter commented 8 years ago
For miniupnpd bug "accept(http): Too many open files" please check amount of 
open files:

ls -1 /proc/`pidof miniupnpd`/fd/ | wc -l

Original comment by lly.dev on 9 Oct 2012 at 4:13

GoogleCodeExporter commented 8 years ago
ls -1 /proc/`pidof miniupnpd`/fd/ | wc -l : 1024
Log is flooded with "miniupnpd[pid]: accept(http): Too many open files" and CPU 
load is 100%.

Original comment by ad...@vampik.ru on 12 Oct 2012 at 3:28

GoogleCodeExporter commented 8 years ago
So, it is definitely either bug inside miniupnpd or some kind of DOS attack.

We have to prepare debug build of miniupnpd to detect exact place of sockets 
leak. For which router model should we compile firmware?

Original comment by lly.dev on 12 Oct 2012 at 6:31

GoogleCodeExporter commented 8 years ago
Here is the log of strace

Original comment by Konst...@gmail.com on 13 Oct 2012 at 8:25

Attachments:

GoogleCodeExporter commented 8 years ago
It's the same bug - sockets leak under some circumstances. I will prepare debug 
build after some analysis.

Original comment by lly.dev on 14 Oct 2012 at 10:08

GoogleCodeExporter commented 8 years ago
Could you provide more info? Output of commands (after problem appear):

cat /proc/net/sockstat
cat /proc/net/sockstat6

cat /proc/net/udp
cat /proc/net/tcp
ls -l /proc/`pidof miniupnpd`/fd

P.S. Output of last three commands can be large, please zip it or put at 
pastebin.com

Original comment by lly.dev on 17 Oct 2012 at 4:46

GoogleCodeExporter commented 8 years ago
I reproduced the bug, will try to fix it.

Original comment by lly.dev on 20 Oct 2012 at 6:08

GoogleCodeExporter commented 8 years ago
Fixed in r4684

Original comment by lly.dev on 20 Oct 2012 at 7:17