Closed martinmakr closed 8 years ago
Hi yes in last version of imq #46 have memory leak i send kernel debug in other mail and wait for info i have 3 mashine with same problem when up imq memory is gon and mashine crash and reboot m.
IPACCT ltd.
On 8 Nov 2016 11:32 p.m., "martinmakr" notifications@github.com wrote:
Hello, in history i used kernel 4.2.3 with imq patch (only) for 1 year and it work fine. Before 2 days i installed on same router (debian based) kernel 4.8.4 patched with imq (patch founded in closed issue #46 https://github.com/imq/linuximq/pull/46). After one day router suddenly rebooted (kernel 4.8.4-imq, 2GB ram). The traffic is about 150Mbps. On my graph i see that it use all memory (about 100MB/hour). Router have 2GB of ram. After this first reboot i try diagnose what happen. Router using more and more memory. But no process use the memory!
router6:# ps aux | awk '{sum+=$6} END {print sum / 1024}' 85.8398 router6:# free -m total used free shared buffers cached Mem: 1985 1939 46 0 0 41 -/+ buffers/cache: 1897 88 Swap: 0 0 0
I try many thing to analyse what happen, i cannot find what use the memory. Trying restarting services and still same. Memory is exhausting. When i stop imq with "ifconfig imq0 down", exhausting of memory stop! The current state is that router have 2GB, processes used 85MB and 46M is free, so the kernel use 1917M of memory (2048-85-46)
For all information, i have same kernel on other 3 router (with other hardware), and there is no problem with memory leak or using by imq. Used memory on other router with same kernel (4.8.4-imq) is about 400MB in 2GB RAM. I know, it sound strange. If you want, i can make some other test or append some diagnostic output. I could not experiment too much because router is in network with customers.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/imq/linuximq/issues/48, or mute the thread https://github.com/notifications/unsubscribe-auth/AMVh9DQ6l_CZBrCHr0GpyCaQSCVm94TRks5q8OqAgaJpZM4Ks7tK .
vel21ripn wrote in #46
Maybe need a code "if (to_free) kfree_skb (to_free);" after label "out:" and before return ?
There is no this code in 4.8 patch. I used this part of backported code in my 4.4 kernel already some weeks or monthes ago with such a string, and no any memleaks and craches.
Hi Stansn PLease write where you add this line : if (to_free) kfree_skb (to_free);
drivers/net/imq.c, in __imq_nf_queue function after out: if (unlikely(to_free)) kfree_skb_list(to_free);
Oki i will try and after test i write status
m.
IPacct ltd. Micron
On Sun, Nov 13, 2016 at 8:26 AM, stasn77 notifications@github.com wrote:
drivers/net/imq.c, in __imq_nf_queue function after out: if (unlikely(to_free)) kfree_skb_list(to_free);
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/imq/linuximq/issues/48#issuecomment-260169445, or mute the thread https://github.com/notifications/unsubscribe-auth/AMVh9IyD8Mwuob1ytV2ZjSo2FJ0gy1Isks5q9q2GgaJpZM4Ks7tK .
And one other problem with last patch for 4.8 kernel machine run with very high load after update machine work fine but 3-4 hour after that mashine start to load and load go to high may be near in source have lock and this is a problem
m.
IPacct ltd. Micron
On Sun, Nov 13, 2016 at 9:16 AM, Martin Zaharinov micron@ipacct.com wrote:
Oki i will try and after test i write status
m.
IPacct ltd. Micron
On Sun, Nov 13, 2016 at 8:26 AM, stasn77 notifications@github.com wrote:
drivers/net/imq.c, in __imq_nf_queue function after out: if (unlikely(to_free)) kfree_skb_list(to_free);
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/imq/linuximq/issues/48#issuecomment-260169445, or mute the thread https://github.com/notifications/unsubscribe-auth/AMVh9IyD8Mwuob1ytV2ZjSo2FJ0gy1Isks5q9q2GgaJpZM4Ks7tK .
perf top -Un ?
this from perf :
37.84% 15149 [kernel] [k] acpi_processor_ffh_cstate_enter 34.69% 10541 [kernel] [k] rht_deferred_worker 9.47% 2909 [kernel] [k] queued_spin_lock_slowpath 1.47% 542 [kernel] [k] e1000_irq_enable 1.19% 436 [kernel] [k] e1000_intr_msi 0.80% 247 [kernel] [k] nf_nat_bysource_hash 0.58% 179 [kernel] [k] nf_nat_cleanup_conntrack 0.49% 176 [kernel] [k] fib_table_lookup 0.44% 158 [kernel] [k] ipt_do_table 0.33% 107 [kernel] [k] __local_bh_enable_ip 0.24% 81 [kernel] [k] iadb_ia 0.23% 78 [kernel] [k] _raw_spin_lock 0.22% 73 [kernel] [k] hfsc_enqueue
IPacct ltd. Micron
On Mon, Nov 14, 2016 at 12:12 PM, stasn77 notifications@github.com wrote:
perf top -Un ?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/imq/linuximq/issues/48#issuecomment-260296764, or mute the thread https://github.com/notifications/unsubscribe-auth/AMVh9MhEFaT6IArguJDP7T187tvKt-7nks5q-DQQgaJpZM4Ks7tK .
patch for 4.8 is to many bug when i back to kernel 4.7.x with old patch for kernel 4.7 its ok.
m.
IPacct ltd. Micron
On Mon, Nov 14, 2016 at 4:05 PM, Martin Zaharinov micron@ipacct.com wrote:
this from perf :
37.84% 15149 [kernel] [k] acpi_processor_ffh_cstate_enter 34.69% 10541 [kernel] [k] rht_deferred_worker 9.47% 2909 [kernel] [k] queued_spin_lock_slowpath 1.47% 542 [kernel] [k] e1000_irq_enable 1.19% 436 [kernel] [k] e1000_intr_msi 0.80% 247 [kernel] [k] nf_nat_bysource_hash 0.58% 179 [kernel] [k] nf_nat_cleanup_conntrack 0.49% 176 [kernel] [k] fib_table_lookup 0.44% 158 [kernel] [k] ipt_do_table 0.33% 107 [kernel] [k] __local_bh_enable_ip 0.24% 81 [kernel] [k] iadb_ia 0.23% 78 [kernel] [k] _raw_spin_lock 0.22% 73 [kernel] [k] hfsc_enqueue
IPacct ltd. Micron
On Mon, Nov 14, 2016 at 12:12 PM, stasn77 notifications@github.com wrote:
perf top -Un ?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/imq/linuximq/issues/48#issuecomment-260296764, or mute the thread https://github.com/notifications/unsubscribe-auth/AMVh9MhEFaT6IArguJDP7T187tvKt-7nks5q-DQQgaJpZM4Ks7tK .
I think reason is in 2 commits from upstream kernel
https://github.com/torvalds/linux/commit/7c9664351980aaa6a4b8837a314360b3a4ad382a https://github.com/torvalds/linux/commit/870190a9ec9075205c0fa795a09fa931694a3ff1
try to revert it temporary and retest again
May be need Feng and Konstantin to recheck codes for kernel 4.8
IPACCT ltd.
On 14 Nov 2016 4:29 p.m., "stasn77" notifications@github.com wrote:
I think reason is in 2 commits from upstream kernel
torvalds/linux@7c96643 https://github.com/torvalds/linux/commit/7c9664351980aaa6a4b8837a314360b3a4ad382a torvalds/linux@870190a https://github.com/torvalds/linux/commit/870190a9ec9075205c0fa795a09fa931694a3ff1
try to revert it temporary and retest again
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/imq/linuximq/issues/48#issuecomment-260348853, or mute the thread https://github.com/notifications/unsubscribe-auth/AMVh9O0_x6c-WY2O7FtijOfW5LJxy3ozks5q-HBWgaJpZM4Ks7tK .
Your (and my) troubles with high cpu load are not IMQ related. 4.8 kernel + IMQ without those two commits (and added some extra code from net-next in my case) works just fine.
Oki i try but not work machine work 1 hour after that stop access and ping wait 2-3 min and machine is back not reboot not error only stop ping and access after that all is fine .... i try with latest kernel 4.8.8 and last fix now i not have memory leak but stop work :) this machine is run imq + eoip + l2tp+ dhcp+ hfsc+sfq i try back to 4.7 kernel and work fine.
IPacct ltd. Micron
On Tue, Nov 15, 2016 at 6:08 AM, stasn77 notifications@github.com wrote:
Yours (and my) troubles with high cpu load is not IMQ related. 4.8 kernel + IMQ without those two commits (and added some extra code from net-next in my case) works just fine.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/imq/linuximq/issues/48#issuecomment-260542401, or mute the thread https://github.com/notifications/unsubscribe-auth/AMVh9OFXOR4H5VOd_KZnpm7b-f0cnC56ks5q-TBIgaJpZM4Ks7tK .
Did you try to revert two commits?
4.8.8 (with some patches) + imq + ndpi + accel (ipoe-dhcp, ipoe-up, pppoe, l2tp) + hfsc + prio + fq_codel
Yes i revert this two commits but problem is same
IPacct ltd. Micron
On Tue, Nov 15, 2016 at 5:10 PM, stasn77 notifications@github.com wrote:
Did you try to revert two commits?
4.8.8 (with some patches) + imq + ndpi + accel (ipoe-dhcp, ipoe-up, pppoe, l2tp) + hfsc + prio + fq_codel
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/imq/linuximq/issues/48#issuecomment-260666768, or mute the thread https://github.com/notifications/unsubscribe-auth/AMVh9KQbqOh5rOlVPnw8CzLo_UhbYziGks5q-ctUgaJpZM4Ks7tK .
Today i trying compile new kernels. However 4.8.8 and 4.4.32 failed compilation on "CC net/core/dev.o" - net/core/dev.c:3050:1: error: redefinition of '__kcrctab_validate_xmit_skb_list' So i'm preparing 4.8.7, with patch linux-4.8-imq.diff and patch2 with "adding line if (unlikely(to_free)) kfree_skb_list(to_free);". When i have results on memory leaking, i will write. It need install to network, on table i cannot simulate memory leaking..
How i can prepare kernel with reverted changes (torvalds/linux@7c96643 and torvalds/linux@870190a) against for example 4.8.7 kernel? Just get diff of this changes and using patch on original kernel source?
yes need to remove from imq.c this line : -@@ -3036,6 +3046,8 @@ struct sk_buff *validate_xmit_skb_list(s
-+EXPORT_SYMBOL(validate_xmit_skb_list); -+
+EXPORT_SYMBOL(validate_xmit_skb_list); -+
This EXPORT is add in Kernel source and not need to patch
IPacct ltd. Micron
On Tue, Nov 15, 2016 at 5:17 PM, Martin MaKr Kratochvíl < notifications@github.com> wrote:
Today i trying compile new kernels. However 4.8.8 and 4.4.32 failed compilation on "CC net/core/dev.o" - net/core/dev.c:3050:1: error: redefinition of '__kcrctab_validate_xmit_skb_list' So i'm preparing 4.8.7, with patch linux-4.8-imq.diff and patch2 with "adding line if (unlikely(to_free)) kfree_skb_list(to_free);". When i have results on memory leaking, i will write. It need install to network, on table i cannot simulate memory leaking..
How i can prepare kernel with reverted changes (torvalds/linux@7c96643 https://github.com/torvalds/linux/commit/7c96643 and torvalds/linux@ 870190a https://github.com/torvalds/linux/commit/870190a) against for example 4.8.7 kernel? Just get diff of this changes and using patch on original kernel source?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/imq/linuximq/issues/48#issuecomment-260668845, or mute the thread https://github.com/notifications/unsubscribe-auth/AMVh9GJM8oEiXDwKUplKm1rTqsffZBCsks5q-cz5gaJpZM4Ks7tK .
to compile 4.8.8 or 4.4.32 you need to delete from net/core/dev.c folowing line: +EXPORT_SYMBOL(validate_xmit_skb_list);
to prepare kernel:
With first patch i have too many Failed :
linux-4.8.8$ patch -p1 -R < ../package/kernel26/kernel-4.8.8-patch1.patch patching file include/net/netfilter/nf_conntrack.h Hunk #1 FAILED at 117. 1 out of 1 hunk FAILED -- saving rejects to file include/net/netfilter/nf_conntrack.h.rej patching file include/net/netfilter/nf_conntrack_extend.h patching file include/net/netfilter/nf_nat.h Hunk #1 succeeded at 30 (offset 1 line). patching file net/netfilter/nf_conntrack_extend.c patching file net/netfilter/nf_nat_core.c Hunk #1 FAILED at 198. Hunk #2 FAILED at 433. Hunk #3 succeeded at 557 (offset 17 lines). Hunk #4 FAILED at 553. Hunk #5 FAILED at 684. Hunk #6 succeeded at 712 (offset 14 lines). 4 out of 6 hunks FAILED -- saving rejects to file net/netfilter/nf_nat_core.c.rej
With second :
linux-4.8.8$ patch -p1 -R < ../package/kernel26/kernel-4.8.8-patch2.patch patching file include/net/netfilter/nf_conntrack.h patching file include/net/netfilter/nf_nat.h patching file net/netfilter/nf_nat_core.c Hunk #6 succeeded at 427 (offset 1 line). Hunk #7 succeeded at 553 (offset 1 line). Hunk #8 succeeded at 688 (offset 1 line). Hunk #9 succeeded at 828 (offset 2 lines). Hunk #10 succeeded at 861 (offset 2 lines). Hunk #11 succeeded at 879 (offset 2 lines).
m.
IPacct ltd. Micron
On Tue, Nov 15, 2016 at 5:26 PM, stasn77 notifications@github.com wrote:
to compile 4.8.8 or 4.4.32 you need to delete from net/core/dev.c folowing line: +EXPORT_SYMBOL(validate_xmit_skb_list);
to prepare kernel:
- download patches adding .diff extension to links from github https://github.com/torvalds/linux/commit/7c9664351980aaa6a4b8837a314360 b3a4ad382a.diff https://github.com/torvalds/linux/commit/870190a9ec9075205c0fa795a09fa9 31694a3ff1.diff
- apply it with patch -p1 -R < patch_name.diff
- recompile kernel
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/imq/linuximq/issues/48#issuecomment-260671443, or mute the thread https://github.com/notifications/unsubscribe-auth/AMVh9FqgRua-slfYKTriOkvNFzxcttRWks5q-c8mgaJpZM4Ks7tK .
Apply with -R in reverse order. first patch2, than patch1
Yes work oki i will try and write status
IPacct ltd. Micron
On Tue, Nov 15, 2016 at 5:44 PM, stasn77 notifications@github.com wrote:
Apply with -R in reverse order. first patch2, than patch1
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/imq/linuximq/issues/48#issuecomment-260676669, or mute the thread https://github.com/notifications/unsubscribe-auth/AMVh9Mf-zPrHVVIYF0RaO9azQn4jd1F3ks5q-dNHgaJpZM4Ks7tK .
no problem is ther but after revert with this patch machine work but after work 1 hour machine stop respons after login on monitor and keyboard and down imq machine work fine without any error
problem is to Big need to recheck full code of imq and may be need to fix many ot struct
m.
IPacct ltd. Micron
On Tue, Nov 15, 2016 at 5:45 PM, Martin Zaharinov micron@ipacct.com wrote:
Yes work oki i will try and write status
IPacct ltd. Micron
On Tue, Nov 15, 2016 at 5:44 PM, stasn77 notifications@github.com wrote:
Apply with -R in reverse order. first patch2, than patch1
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/imq/linuximq/issues/48#issuecomment-260676669, or mute the thread https://github.com/notifications/unsubscribe-auth/AMVh9Mf-zPrHVVIYF0RaO9azQn4jd1F3ks5q-dNHgaJpZM4Ks7tK .
I go other way. On router6, where i originally detect memleak with 4.8.4 and patch imq (4.8), i've installed 4.8.7 with imq patch (4.8) and patch2 (added line if (unlikely(to_free)) kfree_skb_list(to_free);) now. The router is running for 45 minutes, and the memory usage is steady. No wasting 100MB/hour as with older kernel. Load of machine is normal. I never have problem in history with load and imq.
I have another many other routers (router3, router10) where i have memleaking kernel 4.8.4-imq and there i can see growing using of memory, but slower (about 50MB/day) I will test 4.8.7 for few days. After that i try 4.8.8 with reverting changes as stasn77 recommend. Or i can also go back to kernel 4.7.10, because my original motivation was to have kernel without dirty cow bug.
HI Martin But the problem if you revert changes as stans77 recommend you back to other big problem machine stop respons and need to Down imq interface to go back online. With 4.7 patch imq work fine but : first 4.7 is EOL and second have bug in 10G driver which is fixed in 4.8.x
m.
IPacct ltd. Micron
On Tue, Nov 15, 2016 at 6:39 PM, Martin MaKr Kratochvíl < notifications@github.com> wrote:
I go other way. On router6, where i originally detect memleak with 4.8.4 and patch imq (4.8), i've installed 4.8.7 with imq patch (4.8) and patch2 (added line if (unlikely(to_free)) kfree_skb_list(to_free);) now. The router is running for 45 minutes, and the memory usage is steady. No wasting 100MB/hour as with older kernel. Load of machine is normal. I never have problem in history with load and imq.
I have another many other routers (router3, router10) where i have memleaking kernel 4.8.4-imq and there i can see growing using of memory, but slower (about 50MB/day) I will test 4.8.7 for few days. After that i try 4.8.8 with reverting changes as stasn77 recommend. Or i can also go back to kernel 4.7.10, because my original motivation was to have kernel without dirty cow bug.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/imq/linuximq/issues/48#issuecomment-260693992, or mute the thread https://github.com/notifications/unsubscribe-auth/AMVh9NFyWro3eL2GJrbDmYElEPuEWbJ1ks5q-eA6gaJpZM4Ks7tK .
Hi micron, did you try for better diagnostic do not send local traffic or ssh or other specific traffic to imq (using iptables rules to skip -j IMQ --todev X)? And about bug in 4.7? Is it in generic driver for all network card, or for some specific? I prepare one router with 10G card, so i can avoid some mistakes. The version 4.8.7 with imq patch 4.8 and added line for freeing memory still working good for me after 4 hours with no problem. So if it will be stable i have no strong reason to try 4.8.8 with reverting patches and testing issues what you are facing. Of course, for "development and progress for imq" i could make some test with 4.8.8 with revert changes how stans77 recommend on same router.
Hi Martin
Hear setup is to big and i skip ssh traffic , on imq only its internet traffic and iptv Machine run with kernel 4.8.8 + hfsc + sfq , e1000e driver dual 1G card with kernel 4.8.3,4,5,6,7,8 have problem first problem with memory leak may be is fix with lines from stans77 but other problem with crash and (when revert patch from stans77 stop work and need to down imq to back machine online ) is a problem with IMQ code i try to fix but int kernel 4.8 changes is to many and may be need Feng or Konstanatin to check code .
IPacct ltd. Micron
On Tue, Nov 15, 2016 at 10:23 PM, Martin MaKr Kratochvíl < notifications@github.com> wrote:
Hi micron, did you try for better diagnostic do not send local traffic or ssh or other specific traffic to imq (using iptables rules to skip -j IMQ --todev X)? And about bug in 4.7? Is it in generic driver for all network card, or for some specific? I prepare one router with 10G card, so i can avoid some mistakes. The version 4.8.7 with imq patch 4.8 and added line for freeing memory still working good for me after 4 hours with no problem. So if it will be stable i have no strong reason to try 4.8.8 with reverting patches and testing issues what you are facing. Of course, for "development and progress for imq" i could make some test with 4.8.8 with revert changes how stans77 recommend on same router.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/imq/linuximq/issues/48#issuecomment-260756705, or mute the thread https://github.com/notifications/unsubscribe-auth/AMVh9LDO_03oz5bGmlwJj_TEJIMVP7jwks5q-hTVgaJpZM4Ks7tK .
Try patch for 4.8.8 from #49. Included this, in my test this fix memory leak.
#ETH0 DOWNLOAD
IMQ0="imq0"
IMQ0_RATE="95Mbit" # or 950Mbit
#ETH0 UPLOAD
ETH0="eth0"
ETH0_RATE="95Mbit" # or 950Mbit
iptables -t mangle -A PREROUTING -i $ETH0 -j IMQ --todev 0
tc qdisc del dev $IMQ0 root
tc qdisc del dev $ETH0 root
tc qdisc add dev $IMQ0 root handle 1:0 htb r2q 10 default 11
tc class add dev $IMQ0 parent 1:0 classid 1:1 htb rate 1Gbit burst 15k mtu 16000
tc class add dev $IMQ0 parent 1:1 classid 1:11 htb rate $IMQ0_RATE burst 15k prio 1 mtu 16000
tc qdisc add dev $IMQ0 parent 1:11 handle 11 sfq perturb 10
tc qdisc add dev $ETH0 root handle 1:0 htb r2q 10 default 11
tc class add dev $ETH0 parent 1:0 classid 1:1 htb rate 1Gbit burst 15k mtu 1500
tc class add dev $ETH0 parent 1:1 classid 1:11 htb rate $ETH0_RATE burst 15k prio 1 mtu 1500
tc qdisc add dev $ETH0 parent 1:11 handle 11 sfq perturb 10
ip link set imq0 up
Start iperf3 -s
and dstat -nm
.
Thank you k0ste. Excellent work! I compile 4.8.8-imq from #49 and its running on one router now (router10). I have one diagnostic output from another router, with "old" 4.8.4 from #46 with memory leaking problem. And it has problem when reboot, because no reboot happen! On console is still repeating this line, maybe this could be usefull. For reboot it is better to wait to kernel.panic and reboot or use sysrq. The output is - still repeating:
unregister_netdevice: waiting for eth0.204 to become free. Usage count = 948.
And information about 4.8.7 - with #46 and with patch line from stans77 is working correctly on router6. No huge memleak, no other problem. After 2 days with avg 150Mbps traffic, memory used 114MB, processes use 82MB.
Hello, in history i used kernel 4.2.3 with imq patch (only) for 1 year and it work fine. Before 2 days i installed on same router (debian based) kernel 4.8.4 patched with imq (patch founded in closed issue https://github.com/imq/linuximq/pull/46). After one day router suddenly rebooted (kernel 4.8.4-imq, 2GB ram). The traffic is about 150Mbps. On my graph i see that it use all memory (about 100MB/hour). Router have 2GB of ram. After this first reboot i try diagnose what happen. Router using more and more memory. But no process use the memory!
I try many thing to analyse what happen, i cannot find what use the memory. Trying restarting services and still same. Memory is exhausting. When i stop imq with "ifconfig imq0 down", exhausting of memory stop! The current state is that router have 2GB, processes used 85MB and 46M is free, so the kernel use 1917M of memory (2048-85-46)
For all information, i have same kernel on other 3 router (with other hardware), and there is no problem with memory leak or using by imq. Used memory on other router with same kernel (4.8.4-imq) is about 400MB in 2GB RAM. I know, it sound strange. If you want, i can make some other test or append some diagnostic output. I could not experiment too much because router is in network with customers.