nokia / ClickNF

Other
30 stars 7 forks source link

push: Resource temporarily unavailable with Socks4Proxy #4

Open tbarbette opened 5 years ago

tbarbette commented 5 years ago

Hi,

I managed to make the Socks4Proxy work using the configuration in conf, but when downloading a 1G file I get a lot of "push: Resource temporarily unavailable" before eventually freezing after around 800MB. Looks like a memory exhaustion, but it would be a leak in this case, I guess 65536 buffers for DPDK should be plenty enough.

It may be unrelated but I had to change the next two lines in Socks4Proxy :

@@ -257,8 +257,8 @@ Socks4Proxy::push(int port, Packet *p)
                                char msg[8] = { 0x00, 0x5a, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 };
                                q = q->push(8);
                                memcpy((void *)q->data(), msg, 8);
-                               SET_TCP_SOCKFD_ANNO(q, fd);
-                               output(SOCKS4PROXY_OUT_SRV_PORT).push(q);
+                               SET_TCP_SOCKFD_ANNO(q, _socketTable[c][fd].pair);
+                               output(SOCKS4PROXY_OUT_CLI_PORT).push(q);

Indeed the answer supposed to be sent to the client was sent to the server.

I built with "./configure --disable-linuxmodule --enable-user-multithread --enable-dpdk --enable-epoll --disable-batch CXXFLAGS="-g -O1", disabling batch improved things (I guess some of those elements are not compatible).

I have an NGINX server runing on one machine and I make a request from another one with "curl http://10.221.0.5/bin-1G --socks4 10.220.0.1:9000 --output /dev/null".

Thanks for the help ! Tom

gallomassimo commented 5 years ago

Hi Tom,

The change you made, is partly correct. There is indeed an issue but you only need to apply the following change to the socks4proxy:

-                               SET_TCP_SOCKFD_ANNO(q, fd);
+                               SET_TCP_SOCKFD_ANNO(q, _socketTable[c][fd].pair);

The output port is correct (Take a look to the ASCII scheme at the beginning of socks4proxy.cc and to the port ID defines in socks4proxy.h). I tested and fixed this issue.

The "push: Resource temporarily unavailable" prints you are getting are because the TCP transmission queue is full hence the click_push(sockfd, p) call here : https://github.com/nokia/ClickNF/blob/51c8ced490f9197d614eece222258b7fb04e934b/elements/tcp/tcpepollserver.cc#L324 fails and packets are discarded (that explains why you are receiving 800M out of 1G ). I'll commit a proper solution soon (probably Friday or early next week).

tbarbette commented 5 years ago

Thanks Massimo ! It indeed leads to the same result with only that change. I'm curious to understand how it "worked" (up to a few hundreds of MB) while I was sending the packet to the wrong port?

For the push, I guess I can increase that queue size somewhere, right? Looking at: https://github.com/nokia/ClickNF/blob/51c8ced490f9197d614eece222258b7fb04e934b/elements/tcp/tcpstate.cc#L191 Could the socket simply be created blocking? I tried that but epollclient complains. Is it unsupported or I can use another building block?

Thanks,

Tom

gallomassimo commented 5 years ago

Hi Tom,

Sorry for the long delay. I fixed the push issue. To temporary solve the issue, as you said, you could increase the TCP TXQ buffer:

https://github.com/nokia/ClickNF/blob/228365b6e4bb770bd2c4047e016dbbff5f2a3d74/include/clicknet/tcp.hh#L254

However, the proper fix would be to buffer packets in tcpepollsever and tcpepollclient when there is no free space in the TCP TXQ buffer. That is what I just committed. In my tests it seems downloading the complete file now.

Max

tbarbette commented 5 years ago

Thanks ! I'll try it out !

gallomassimo commented 5 years ago

I close the issue and let you re-open it if the fix does not work.

tbarbette commented 5 years ago

Mhh now only 1 packet of HTTP payload gets transferred (when fetching a 8K file), then not sure what happens or exactly who is waiting. Both ends do not fire any retransmission. The server has sent all data, but the client does not receive it, and does not send DUP ACKs either. It seems the sock proxy or the application is keeping the packets.

With batching I receive a little more data but some packets are never sent. Should I enable batching or not, using the sample config?

gallomassimo commented 5 years ago

Hi Tom,

I just tested with the latest version of the code on git and data gets transferred (with 8K, 10M 1G 10G files). Here are some details about my configuration:

Server: lighthttpd (that is the server I had at hand. It should not change much)

Proxy: ClickNF

configured and compiled with: make clean; ./configure --disable-linuxmodule --enable-user-multithread --enable-dpdk --enable-epoll --enable-openssl; make -j40

launched with: sudo bin/click --dpdk -c0x1 -n10 -- conf/sock4-proxy-modular.click

(proxy configuration is the one in the git repo)

Client: curl http://10.0.1.2:9001/8K.txt --socks4 10.0.0.1:9000 --output /dev/null

Can you please give me more details about your configuration in order to understand what is going wrong for you? What does the system prints out if you set VERBOSE (TCPEpollSever TCPEpollClient Socks4Proxy) and enable IPPrint ?

tbarbette commented 5 years ago

Hi Massimo !

Sorry I come back only on this now. So actually, I do receive a few packets (when querying for a 8KB file) but then ClickNF gets stuck in some kind of infinite loop : image It does not respond to ARP requests etc. I configured as you said.

Here is the log with verbose, Print here and there and some click_chatter of mine:

EAL: Detected 16 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Probing VFIO support...
EAL: PCI device 0000:17:00.0 on NUMA socket 0
EAL:   probe driver: 15b3:1017 net_mlx5
EAL: PCI device 0000:17:00.1 on NUMA socket 0
EAL:   probe driver: 15b3:1017 net_mlx5
-------------------------------------------
DPDK: port 0, MAC address 50:6B:4B:43:88:CA
DPDK: port 1, MAC address 50:6B:4B:43:88:CB
-------------------------------------------
DPDK: delaying start by 5s
net_mlx5: port 0 ioctl(SIOCETHTOOL, ETHTOOL_SPAUSEPARAM) failed: Invalid argument
conf/sock4-proxy-modular-print.click:13: While initializing ‘dpdk0 :: DPDK’:
  warning: Flow control set failed
DPDK: port 0, MAC address 50:6B:4B:43:88:CA
DPDK: port 0, link up, speed 100000 Mbps, full-duplex 10
-------------------------------------------
DPDK: port 0, MAC address 50:6B:4B:43:88:CA
DPDK: port 1, MAC address 50:6B:4B:43:88:CB
-------------------------------------------
DPDK: delaying start by 5s
net_mlx5: port 1 ioctl(SIOCETHTOOL, ETHTOOL_SPAUSEPARAM) failed: Invalid argument
conf/sock4-proxy-modular-print.click:14: While initializing ‘dpdk1 :: DPDK’:
  warning: Flow control set failed
DPDK: port 1, MAC address 50:6B:4B:43:88:CB
DPDK: port 1, link up, speed 100000 Mbps, full-duplex 10
TCPEpollServer: listen sockfd 3
TCPEpollServer: bounded to 10.220.0.1, port 9000
TCPEpollServer: listening at 10.220.0.1, port 9000
TCPEpollServer: created epoll fd 1
TCPEpollClient: created epoll fd 2
FROM1:   74 | 506b4b43 88cb9803 9b2c0eb4 08004500 003c09d9 40004006
RX1:   60 | 4500003c 09d94000 40061b26 0adc0005 0adc0001 a8642328
TCPNewRenoSyn: syn, cwnd 14600, ssthresh 0
TX1: 168704.275230482: 10.220.0.1.9000 > 10.220.0.5.43108: S 2791422300:2791422301(1,60,60) ack 3019882817 win 65535
FROM1:   60 | 506b4b43 88cb9803 9b2c0eb4 08060001 08000604 00029803
FROM1:   66 | 506b4b43 88cb9803 9b2c0eb4 08004500 003409da 40004006
RX1:   52 | 45000034 09da4000 40061b2d 0adc0005 0adc0001 a8642328
TCPEstimateRTT: rtt 194 us, srtt 194 us, rttvar 97 us, rto 200 ms
TCPNewRenoAck: ack, cwnd 14601, ssthresh 2147483648, slow start, bytes acked 1
TCPEpollServer: epoll 1 events
TCPEpollServer: accepted fd 4 from 10.220.0.5 port 43108
Socks4Proxy: Incoming packet for fd 4
Socks4Proxy: Inserted S_ESTABLISHED fd 4
FROM1:   75 | 506b4b43 88cb9803 9b2c0eb4 08004500 003d09db 40004006
RX1:   61 | 4500003d 09db4000 40061b23 0adc0005 0adc0001 a8642328
TCPEpollServer: epoll 1 events
TCPEpollServer: event on sockfd = 4
Socks4Proxy: Incoming packet for fd 4
Socks4Proxy: SOCKS4 parsed correctly, CONNECTING to (10.221.0.5,80) through fd 5
TCPEpollClient: adding fd 5 to clients
TCPEpollClient: binding fd 5 to source address 10.221.0.1
TCPEpollClient: connecting fd 5 to  10.221.0.5
TX0: 168704.275535179: 10.221.0.1.45265 > 10.221.0.5.80: S 3117387411:3117387412(1,60,60) win 65535
wait nonblock
FROM0:   60 | 506b4b43 88ca9803 9b0353b1 08060001 08000604 00029803
FROM0:   74 | 506b4b43 88ca9803 9b0353b1 08004500 003c0000 40004006
RX0:   60 | 4500003c 00004000 400624fd 0add0005 0add0001 0050b0d1
TCPNewRenoSyn: syn, cwnd 14600, ssthresh 0
TX0: 168704.275731326: 10.221.0.1.45265 > 10.221.0.5.80: . 3117387412:3117387412(0,52,52) ack 2156333779 win 32768
TCPEpollClient: epoll 1 events
TCPEpollClient: connected 5
Socks4Proxy: Incoming packet for fd 5
Socks4Proxy: SOCKS4, 2 legs connected through fd 5, 4
TCPEpollServer: Fwd a packet of 8 bytes on established cponnection pair
TX1: 0.000000: 10.220.0.1.9000 > 10.220.0.5.43108: . 2791422301:2791422309(8,60,60) ack 3019882826 win 32768
TCPEpollClient: EPOLLOUT event on sockfd = 5
TCPEpollClient: unregistering sockfd 5 ofr  EPOLLOUT event
FROM1:   66 | 506b4b43 88cb9803 9b2c0eb4 08004500 003409dc 40004006
RX1:   52 | 45000034 09dc4000 40061b2b 0adc0005 0adc0001 a8642328
TCPEstimateRTT: rtt 32 us, srtt 173 us, rttvar 113 us, rto 200 ms
TCPNewRenoAck: ack, cwnd 14609, ssthresh 2147483648, slow start, bytes acked 8
FROM1:  146 | 506b4b43 88cb9803 9b2c0eb4 08004500 008409dd 40004006
RX1:  132 | 45000084 09dd4000 40061ada 0adc0005 0adc0001 a8642328
TCPEpollServer: epoll 1 events
TCPEpollServer: event on sockfd = 4
Socks4Proxy: Incoming packet for fd 4
Socks4Proxy: forwarding packet(80B) from fd 4 to fd 5
TCPEpollClient: Fwd a packet of 80 bytes on established cponnection pair
TX0: 0.000000: 10.221.0.1.45265 > 10.221.0.5.80: . 3117387412:3117387492(80,132,132) ack 2156333779 win 32768
wait nonblock
FROM0:   66 | 506b4b43 88ca9803 9b0353b1 08004500 0034d65a 40004006
RX0:   52 | 45000034 d65a4000 40064eaa 0add0005 0add0001 0050b0d1
TCPEstimateRTT: rtt 47 us, srtt 47 us, rttvar 23 us, rto 200 ms
TCPNewRenoAck: ack, cwnd 14680, ssthresh 2147483648, slow start, bytes acked 80
FROM0: 1514 | 506b4b43 88ca9803 9b0353b1 08004500 05dcd65b 40004006
RX0: 1500 | 450005dc d65b4000 40064901 0add0005 0add0001 0050b0d1
FROM0: 1514 | 506b4b43 88ca9803 9b0353b1 08004500 05dcd65c 40004006
RX0: 1500 | 450005dc d65c4000 40064900 0add0005 0add0001 0050b0d1
TX0: 168704.276064180: 10.221.0.1.45265 > 10.221.0.5.80: . 3117387492:3117387492(0,52,52) ack 2156336675 win 32677
FROM0: 1514 | 506b4b43 88ca9803 9b0353b1 08004500 05dcd65d 40004006
RX0: 1500 | 450005dc d65d4000 400648ff 0add0005 0add0001 0050b0d1
FROM0: 1514 | 506b4b43 88ca9803 9b0353b1 08004500 05dcd65e 40004006
RX0: 1500 | 450005dc d65e4000 400648fe 0add0005 0add0001 0050b0d1
TX0: 168704.276064180: 10.221.0.1.45265 > 10.221.0.5.80: . 3117387492:3117387492(0,52,52) ack 2156339571 win 32587
FROM0: 1514 | 506b4b43 88ca9803 9b0353b1 08004500 05dcd65f 40004006
RX0: 1500 | 450005dc d65f4000 400648fd 0add0005 0add0001 0050b0d1
TCPEpollClient: epoll 1 events
TCPEpollClient: event on sockfd = 5
Socks4Proxy: Incoming packet for fd 5
Socks4Proxy: forwarding packet(1448B) from fd 5 to fd 4
TCPEpollServer: Fwd a packet of 1448 bytes on established cponnection pair
TX1: 0.000000: 10.220.0.1.9000 > 10.220.0.5.43108: . 2791422309:2791423757(1448,1500,1500) ack 3019882906 win 32768
Socks4Proxy: Incoming packet for fd 5
Socks4Proxy: forwarding packet(1448B) from fd 5 to fd 4
TCPEpollServer: Fwd a packet of 1448 bytes on established cponnection pair
Socks4Proxy: Incoming packet for fd 5
Socks4Proxy: forwarding packet(1448B) from fd 5 to fd 4
TCPEpollServer: Fwd a packet of 1448 bytes on established cponnection pair
TX1: 0.000000: 10.220.0.1.9000 > 10.220.0.5.43108: . 2791423757:2791425205(1448,1500,1500) ack 3019882906 win 32768
TX1: 0.000000: 10.220.0.1.9000 > 10.220.0.5.43108: . 2791425205:2791426653(1448,1500,1500) ack 3019882906 win 32768
Socks4Proxy: Incoming packet for fd 5
Socks4Proxy: forwarding packet(1448B) from fd 5 to fd 4
TCPEpollServer: Fwd a packet of 1448 bytes on established cponnection pair
Socks4Proxy: Incoming packet for fd 5
Socks4Proxy: forwarding packet(1448B) from fd 5 to fd 4
TCPEpollServer: Fwd a packet of 1448 bytes on established cponnection pair
TX1: 0.000000: 10.220.0.1.9000 > 10.220.0.5.43108: . 2791426653:2791428101(1448,1500,1500) ack 3019882906 win 32768
TX1: 0.000000: 10.220.0.1.9000 > 10.220.0.5.43108: . 2791428101:2791429549(1448,1500,1500) ack 3019882906 win 32768
wait nonblock
FROM0: 1277 | 506b4b43 88ca9803 9b0353b1 08004500 04efd660 40004006
RX0: 1263 | 450004ef d6604000 400649e9 0add0005 0add0001 0050b0d1
TX0: 168704.276173760: 10.221.0.1.45265 > 10.221.0.5.80: . 3117387492:3117387492(0,52,52) ack 2156342231 win 32730
TCPEpollClient: epoll 1 events
TCPEpollClient: event on sockfd = 5
Socks4Proxy: Incoming packet for fd 5
Socks4Proxy: forwarding packet(1211B) from fd 5 to fd 4
TCPEpollServer: Fwd a packet of 1211 bytes on established cponnection pair
>--< stuck

1K file download works though. With 128K I get 117K. So it seems the trouble seems with the end of the connection. The server (running nginx) sends the retransmission, but the loop mentioned above prevents re-trying them.

Thanks, Tom

gallomassimo commented 5 years ago

Hi Tom,

No problems. I am still not able to reproduce your issue , I'll try to use nginx to see if I can reproduce it. I also noticed you are using mellanox cards, I'll try to see if I can use such cards in our testbed.

From what you write, it seems that the code is somehow blocked here:

https://github.com/nokia/ClickNF/blob/873fd4f1c95c63fb69c8cd48c6c4b9fd2517d548/elements/tcp/tcpepollclient.cc#L191-L206

however, it is not clear to me why this is happening. To understand better what is happening, when you write ">--< stuck" does ClickNF continues to print out something? If yes, it should print out something like:

TCPEpollClient: epoll 1 events
TCPEpollClient: event on sockfd = 5
Socks4Proxy: Incoming packet for fd 5
Socks4Proxy: forwarding packet(1211B) from fd 5 to fd 4
TCPEpollServer: Fwd a packet of 1211 bytes on established cponnection pair

one possible solution could be of not draining RX buffers but limiting the amount of packets pulled when an EPOLLIN event is reported.

tbarbette commented 5 years ago

No it does not print anything at stuck. It's in the loop you showed, most probably.

Yes I indeed use Mellanox. Unfortunately I don't have any "in the middle" setup with Intel right now...

To try Mellanox, you will need the ~4..~1 commits of https://github.com/tbarbette/ClickNF/tree/DPDK11andMLX, to compile with DPDK 18.11.

I don't think any of these changes could be the reason of the break but if you could try with that branch, it would ensure this is not the problem.

tbarbette commented 5 years ago

I have no problem letting you access the testbed if it comes to that, btw.

gallomassimo commented 5 years ago

I think I found the problem but I can't test the solution immediately.

In the infinite loop:

https://github.com/nokia/ClickNF/blob/873fd4f1c95c63fb69c8cd48c6c4b9fd2517d548/elements/tcp/tcpepollclient.cc#L191-L206

the while should be:

while(p->length() > 0)

this is a bug I introduced while fixing your first issue "push: Resource temporarily unavailable". I was probably not able to reproduce the bug because in my settings the TCP receive queue becomes empty only at the end of the transfer (and the socket is closed before this condition is met).

The same fix should be applied to tcpepollserver.cc as well.

tbarbette commented 5 years ago

Just changing the break by return worked! Thanks!

gallomassimo commented 5 years ago

Yeah, that works too but I think that the while fix is better because there might be other type of events to handle i.e., EPOLLERR or EPOLLOUT.

I also noticed that there might be other problems in tcpepollclient but hopefully they are not impacting you at the moment.

tbarbette commented 5 years ago

Yes right. But just while(p->length () > 0) does not work because sometimes p is null. And while(p && p->length()) prevents the kill so we loose packets. Maybe a goto to exit the main loop.

Unfortunately, packets are leaking somewhere, I can only maintain while true ; do curl http://10.221.0.5/bin-1G --socks4 10.220.0.1:9000 --output /dev/null ; done for a few loops. I'm trying to corner it.

gallomassimo commented 5 years ago

It is strange sometimes it is null and sometimes not. I'll try to do some testing next week.