xebd / accel-ppp

High performance PPTP/L2TP/PPPoE/IPoE server for Linux
GNU General Public License v2.0
296 stars 108 forks source link

Lede/OpenWRT support #41

Closed Mushoz closed 4 years ago

Mushoz commented 6 years ago

Maybe it would be a cool idea to create and maintain a Lede/OpenWRT package of this software? I'm not sure how difficult this would be, but it would be a very welcome addition. The current PPPoE implementation in Lede/OpenWRT is pretty slow and these routers have a lot to gain from the performance optimizations done in accel-ppp. However, I am not entirely sure how difficult it would be to get this ported and integrated into Lede/OpenWRT.

Any thoughts regarding this issue?

inste commented 6 years ago

@Mushoz All modern client implementations of PPPoE in LEDE/OpenWRT already use in-kernel PPPoE. If you mean PPPoE server, I didn't hear about userspace implementations for Linux. So they use in-kernel implementations too. So what exactly do you want to intergrate into LEDE/OpenWRT?

themiron commented 6 years ago

@inste rp-pppoe was initially implemented in userspace only. sure, these days are gone

inste commented 6 years ago

@themiron I know, but as you've already said, these days are gone. rp-pppoe has been using in-kernel driver more than 10 years by now.

Mushoz commented 6 years ago

Yes, you are correct in assuming that rp-pppoe is using an in-kernel driver with Lede. However, it doesn't seem very well threaded, since I am running into an upload bottleneck, even though top shows plenty of idle CPU % leftover. Htop shows that the first core is fully loaded, while the rest is relatively idle. I was hoping it would be possible to use accel-ppp as a pppoe client with properly multithreaded code. The encapsulation of pppoe packets should be quite possible to multithread, right?

inste commented 6 years ago

@Mushoz if your build of OpenWRT/LEDE kernel includes enabled XPS/RPS and network (PDMA) driver is written properly, all CPU must be utilized (not by single flow, but with multiple). How do you perform tests of bandwidth and CPU utilization?

Mushoz commented 6 years ago

@inste I am using a DIR-860L B1 router, which utilizes a Mediatek 2 core / 4 thread processor. I was going to use this router on a 500/500 Mbit fibre connection, so I tested the NAT performance first to see whether it was up for the task, since NAT is done in software in Lede/Openwrt.

I connected a computer via a gigabit connection to the WAN port of the router, and left masquerading and firewall enabled. On this computer iperf3 was run in server mode. I connected a second computer via a gigabit connection to the LAN port of the router, which was the iperf3 client. Both computers were in their own different subnet. Iperf3 tests showed that that the router was able to route at nearly full gigabit speeds (930 mbit/s) in either direction and would be sufficiently fast for a 500/500 mbit connection.

However, as soon as I used the router with my real connection, speed tests were only showing 500/400 mbit speeds. The only difference with the local setup is: 1) My ISP uses PPPoE to set-up a connection 2) My ISP uses a tagged VLAN for internet

My ISP provided router was properly showing a steady 500/500 mbit connection, so the connection itself is fine. My router was showing NAT speeds way above 500 mbit in my local tests, so that should also be fine.

During the speedtests, htop is showing CPU0 at 100% usage, while CPU 1 through 3 are showing 30-60% utilization. Looking at the hardware interrupts, multiple queues do not appear to be properly implemented. At least not for my device (formatting is messed up, I've tried a code block, but it's even worse):

root@LEDE:~# cat /proc/interrupts CPU0 CPU1 CPU2 CPU3
8: 8340307 8340285 8340289 8340278 MIPS GIC Local 1 timer 9: 39832 0 0 0 MIPS GIC 63 IPI call 10: 0 5329303 0 0 MIPS GIC 64 IPI call 11: 0 0 8081921 0 MIPS GIC 65 IPI call 12: 0 0 0 7308397 MIPS GIC 66 IPI call 13: 134081 0 0 0 MIPS GIC 67 IPI resched 14: 0 434975 0 0 MIPS GIC 68 IPI resched 15: 0 0 3934766 0 MIPS GIC 69 IPI resched 16: 0 0 0 125414 MIPS GIC 70 IPI resched 18: 14 0 0 0 MIPS GIC 33 serial 19: 0 0 0 0 MIPS GIC 29 xhci-hcd:usb1 20: 43174867 0 0 0 MIPS GIC 10 1e100000.ethernet 21: 30 0 0 0 MIPS GIC 30 gsw 22: 2384524 0 0 0 MIPS GIC 11 mt76x2e 23: 14548573 0 0 0 MIPS GIC 31 mt76x2e ERR: 1445299

While this could be way better with multiple TX and RX queues, even without these optimizations it was able to push gigabit speeds in my local tests, so it should be sufficiently fast. VLAN overhead is negligible so that leaves me with PPPoE to blame. I've also tried playing around with IRQ affinities to move the WLAN drivers to one core (CPU2 & 3) and only keeping ethernet on the first core (CPU0) without any measurable speed improvements.

If I'm making a thought error somewhere, please let me know, but I'm currently out of ideas. I guess I could buy a faster device to use brute force to get full speeds with this connection, but where's the fun in that if optimizations are still possible ;)

Do you think accel-ppp could be of use in speeding up this connection by properly using all cores/threads for the encapsulation of PPPoE packets? :)

inste commented 6 years ago

@Mushoz do you have MT7621A(T)? It should scale linear under PPPoE, as I've seen in my lab. You should check if XPS/RPS is enabled and work at first, and probably check quality of ethernet driver. No, accel-ppp is purely control-plane software, all real packets processing is located in kernel, so it will not help you.

Mushoz commented 6 years ago

@inste Yes it is the MT7621AT. Thank you very much for your insight. I'll try to figure out whether XPS/RPS is enabled. Checking the quality of the ethernet driver is probably beyond my capabilities, but I'll have a look regardless. Might learn something from just studying the code :)

From what I can gather from the following topic, the ethernet driver isn't multi-core / multi-queue capable, which is a bummer: https://forum.lede-project.org/t/mt7621-wg3526-multicore-support/6918/22

inste commented 6 years ago

@Mushoz the hardware is capable to work with multiple queues, so it is limitation of openwrt's and probably mediatek sdk's driver. In our proprietary products we've rewritten driver from scratch, it works like a charm.

Mushoz commented 6 years ago

Good to know. Thank you so much for all the information, and sorry for straying completely away from the accel-ppp subject :)