qdeconinck / mp-quic

Please read https://multipath-quic.org/2017/12/09/artifacts-available.html to figure out how to setup the code.
MIT License
176 stars 71 forks source link

MPQUIC and packet loss in a mininet envirement #31

Closed elhachi closed 1 year ago

elhachi commented 2 years ago

Hello, I’m writing an MPQUIC client-server connection using Golang in a Mininet environment. I want to forward a file from the server to the client, so I send the whole stream over the connection and I create a buffer to read data from the stream and store it to the received file. Everything went well until I added loss/delay options to the Mininet environment. Although the loss was only 1%, the number of packet losses was bigger than what was expected. Sometimes I lose more than half of the amount of data. I don’t know where the problem is exactly? Maybe the MPQUIC protocol and Mininet’s way of simulating package loss and delay do not go well together. If yes, could you please direct me to the right software that can be more efficient than Mininet?

qdeconinck commented 2 years ago

What is your exact setup? How do you set up losses? What are the exact commands you are running and from where?

elhachi commented 2 years ago

Thank you for your reply, I really appreciate it. The client-server code was written based on this Github project: https://github.com/prat-bphc52/VideoStreaming-MPTCP-MPQUIC, then I added loss/delay options to the Mininet environment using the following commands: net.addLink(router, client, cls=TCLink, bw=10, loss=1, delay='10ms', max_queue_size=25) net.addLink(router, client, cls=TCLink, bw=10, loss=1, delay='50ms', max_queue_size=100) net.addLink(router, server, cls=TCLink, bw=10, loss=1, delay='10ms', max_queue_size=25) net.addLink(router, server, cls=TCLink, bw=10, loss=1, delay='50ms', max_queue_size=100)

qdeconinck commented 2 years ago

You should not limit the max queue size using netem, it won't give you the results you expect (see Section 5.2.2 of my thesis available at https://qdeconinck.github.io/assets/thesis_deconinck.pdf). To model buffer sizes, instead rely on shaping (tbf or htb) or policing. You can have a look at https://github.com/qdeconinck/minitopo.

elhachi commented 2 years ago

But even when I delete the max queue size option I'm still getting the same behavior.

qdeconinck commented 2 years ago

What is the actual tc command generated by mininet?

elhachi commented 2 years ago

I'm not using the tc command. Instead, I'm using the mininet.link.TCIntf Class in a python script according to this article: http://mininet.org/api/classmininet_1_1link_1_1TCIntf.html. I tried in earlier times to add loss/delay options using commands and keep the python script defining the bandwidth only, but it gave me an error telling me that I have to declare these options inside the script. net.addLink(router, client, cls=TCLink, bw=10, loss=1, delay='10ms') net.addLink(router, client, cls=TCLink, bw=10, loss=1, delay='50ms') net.addLink(router, server, cls=TCLink, bw=10, loss=1, delay='10ms') net.addLink(router, server, cls=TCLink, bw=10, loss=1, delay='50ms')

qdeconinck commented 2 years ago

Yes, but it uses tc under the hood. In the mining setup, how are links configured (tc qdisc show when the network is ready to use)?

elhachi commented 2 years ago

qdisc htb 5: dev server-eth0 root refcnt 2 r2q 10 default 1 direct_packets_stat 0 direct_qlen 1000 qdisc netem 10: dev server-eth0 parent 5:1 limit 1000 delay 10.0ms loss 1% qdisc htb 5: dev server-eth1 root refcnt 2 r2q 10 default 1 direct_packets_stat 0 direct_qlen 1000 qdisc netem 10: dev server-eth1 parent 5:1 limit 1000 delay 50.0ms loss 1%

qdeconinck commented 2 years ago

And what about tc class show for htb?

It might also be useful to have an example of loss pattern you observe (e.g with either logs or pcap trace) vs. what you would expect.

elhachi commented 2 years ago

I'm a beginner at Mininet, so excuse me if I took so much time to understand. I ran the following command: tc class show, and it returns nothing...

qdeconinck commented 2 years ago

From what I understand, there is no bandwidth limitation being applied, which may give very strange behaviours. You should look at the exact command being generated (this (https://github.com/mininet/mininet/blob/270a6ba3335301f1e4757c5fb7ee64c1d3580bf2/mininet/link.py#L316) is the line you want to see the output, have a look in Mininet in how you can enable such logging) and paste the output here.

elhachi commented 2 years ago

I'm sorry I just know how to use the tc class command, so it returns: class htb 5:1 root leaf 10: prio 0 rate 10000kbit ceil 10000kbit burst 15kb cburst 1600b

qdeconinck commented 2 years ago

Sounds ok then. To further understand what's going on, a PCAP trace along with the logs at both client/server sides could be nice to understand where packets losses occur.

elhachi commented 2 years ago

Hello, Working by your suggestion I took a PCAP caption and got some strange behaviors. The first is the occurrence of more than one ARP packet. According to my knowledge, they can be seen at the beginning of the conversation when the MAC addresses must be discovered. The second is getting the "Destination Unreachable (Port Unreachable)" error from the ICMP packet.

elhachi commented 2 years ago

Capture3 Capture2 Capture1

elhachi commented 2 years ago

I found this note when reading the MultipathTester article: "We notably noticed some connectivity issues with QUIC using IPv6, but in America, we observed better performance using IPv6 rather than IPv4". Might it be the reason in my case also?

qdeconinck commented 2 years ago
elhachi commented 2 years ago

The whole PCAP file: mpquic_trace_client.zip

elhachi commented 2 years ago

I think this disconnectivity is caused by the huge amount of packets coming from the client-side (two sources). So the server was so busy to receive the entire data via one path. Notice that I create two paths for the server to use MPQUIC but it keeps working with only one! Thank you for your time, I appreciate that.

qdeconinck commented 2 years ago

There are a few strange elements in your trace:

elhachi commented 2 years ago

The only expected idle was located at the beginning of the connection because I had to enter some data manually and it takes a few seconds. Otherwise, I did not expect any idle moments. Thank you so much for your help, I will try to do what you recommend.