Closed Mushoz closed 4 years ago
@Mushoz All modern client implementations of PPPoE in LEDE/OpenWRT already use in-kernel PPPoE. If you mean PPPoE server, I didn't hear about userspace implementations for Linux. So they use in-kernel implementations too. So what exactly do you want to intergrate into LEDE/OpenWRT?
@inste rp-pppoe was initially implemented in userspace only. sure, these days are gone
@themiron I know, but as you've already said, these days are gone. rp-pppoe has been using in-kernel driver more than 10 years by now.
Yes, you are correct in assuming that rp-pppoe is using an in-kernel driver with Lede. However, it doesn't seem very well threaded, since I am running into an upload bottleneck, even though top shows plenty of idle CPU % leftover. Htop shows that the first core is fully loaded, while the rest is relatively idle. I was hoping it would be possible to use accel-ppp as a pppoe client with properly multithreaded code. The encapsulation of pppoe packets should be quite possible to multithread, right?
@Mushoz if your build of OpenWRT/LEDE kernel includes enabled XPS/RPS and network (PDMA) driver is written properly, all CPU must be utilized (not by single flow, but with multiple). How do you perform tests of bandwidth and CPU utilization?
@inste I am using a DIR-860L B1 router, which utilizes a Mediatek 2 core / 4 thread processor. I was going to use this router on a 500/500 Mbit fibre connection, so I tested the NAT performance first to see whether it was up for the task, since NAT is done in software in Lede/Openwrt.
I connected a computer via a gigabit connection to the WAN port of the router, and left masquerading and firewall enabled. On this computer iperf3 was run in server mode. I connected a second computer via a gigabit connection to the LAN port of the router, which was the iperf3 client. Both computers were in their own different subnet. Iperf3 tests showed that that the router was able to route at nearly full gigabit speeds (930 mbit/s) in either direction and would be sufficiently fast for a 500/500 mbit connection.
However, as soon as I used the router with my real connection, speed tests were only showing 500/400 mbit speeds. The only difference with the local setup is: 1) My ISP uses PPPoE to set-up a connection 2) My ISP uses a tagged VLAN for internet
My ISP provided router was properly showing a steady 500/500 mbit connection, so the connection itself is fine. My router was showing NAT speeds way above 500 mbit in my local tests, so that should also be fine.
During the speedtests, htop is showing CPU0 at 100% usage, while CPU 1 through 3 are showing 30-60% utilization. Looking at the hardware interrupts, multiple queues do not appear to be properly implemented. At least not for my device (formatting is messed up, I've tried a code block, but it's even worse):
root@LEDE:~# cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3
8: 8340307 8340285 8340289 8340278 MIPS GIC Local 1 timer
9: 39832 0 0 0 MIPS GIC 63 IPI call
10: 0 5329303 0 0 MIPS GIC 64 IPI call
11: 0 0 8081921 0 MIPS GIC 65 IPI call
12: 0 0 0 7308397 MIPS GIC 66 IPI call
13: 134081 0 0 0 MIPS GIC 67 IPI resched
14: 0 434975 0 0 MIPS GIC 68 IPI resched
15: 0 0 3934766 0 MIPS GIC 69 IPI resched
16: 0 0 0 125414 MIPS GIC 70 IPI resched
18: 14 0 0 0 MIPS GIC 33 serial
19: 0 0 0 0 MIPS GIC 29 xhci-hcd:usb1
20: 43174867 0 0 0 MIPS GIC 10 1e100000.ethernet
21: 30 0 0 0 MIPS GIC 30 gsw
22: 2384524 0 0 0 MIPS GIC 11 mt76x2e
23: 14548573 0 0 0 MIPS GIC 31 mt76x2e
ERR: 1445299
While this could be way better with multiple TX and RX queues, even without these optimizations it was able to push gigabit speeds in my local tests, so it should be sufficiently fast. VLAN overhead is negligible so that leaves me with PPPoE to blame. I've also tried playing around with IRQ affinities to move the WLAN drivers to one core (CPU2 & 3) and only keeping ethernet on the first core (CPU0) without any measurable speed improvements.
If I'm making a thought error somewhere, please let me know, but I'm currently out of ideas. I guess I could buy a faster device to use brute force to get full speeds with this connection, but where's the fun in that if optimizations are still possible ;)
Do you think accel-ppp could be of use in speeding up this connection by properly using all cores/threads for the encapsulation of PPPoE packets? :)
@Mushoz do you have MT7621A(T)? It should scale linear under PPPoE, as I've seen in my lab. You should check if XPS/RPS is enabled and work at first, and probably check quality of ethernet driver. No, accel-ppp is purely control-plane software, all real packets processing is located in kernel, so it will not help you.
@inste Yes it is the MT7621AT. Thank you very much for your insight. I'll try to figure out whether XPS/RPS is enabled. Checking the quality of the ethernet driver is probably beyond my capabilities, but I'll have a look regardless. Might learn something from just studying the code :)
From what I can gather from the following topic, the ethernet driver isn't multi-core / multi-queue capable, which is a bummer: https://forum.lede-project.org/t/mt7621-wg3526-multicore-support/6918/22
@Mushoz the hardware is capable to work with multiple queues, so it is limitation of openwrt's and probably mediatek sdk's driver. In our proprietary products we've rewritten driver from scratch, it works like a charm.
Good to know. Thank you so much for all the information, and sorry for straying completely away from the accel-ppp subject :)
Maybe it would be a cool idea to create and maintain a Lede/OpenWRT package of this software? I'm not sure how difficult this would be, but it would be a very welcome addition. The current PPPoE implementation in Lede/OpenWRT is pretty slow and these routers have a lot to gain from the performance optimizations done in accel-ppp. However, I am not entirely sure how difficult it would be to get this ported and integrated into Lede/OpenWRT.
Any thoughts regarding this issue?