Closed PeterCxy closed 7 years ago
thus I tried to synchronize the time on the server and client and made sure that the current timestamp on the server and client are as close as possible, but the problem remains
Udp2raw doesnt rely on synchronized time at all.so this is not the problem.
I still have no idea at the moment. I will reply as soon as i get an idea.
Sorry,I cant reproduce this.Have you tried to only restart client or only restart server? Could single side restaring solve the problem? And what will happen if you let the client and server keep retrying for 5 minutes,will they eventually get a connection?
Could you plz give --raw-mode udp a try? udp mode doesnt have syn,so if i know the behavior under udp,it may help.
Could single side restaring solve the problem?
Yes. Sometimes it gets working by only restarting the server for several times
And what will happen if you let the client and server keep retrying for 5 minutes,will they eventually get a connection?
I tried to let them SYN for ~2min and nothing changes.
Could you plz give --raw-mode udp a try? udp mode doesnt have syn,so if i know the behavior under udp,it may help.
I have tried ICMP earlier but I don't have any screenshots now. I'll post them here later when I have a try with ICMP and UDP.
did you set your openvpn to automatically modify your route table? sometimes if this wasnt done correctly,there might be a route cirle.
could you try to stop openvpn client and only test with udp2raw client and server?
No, the two screenshots posted above are not taken from when udp2raw is tunneling OpenVPN. I have shut down OpenVPN on both sides in order to test whether udp2raw works on its own. It's definitely not something about the routing table.
I will do further tests tomorrow.
@wangyu- Yes it also happens with UDP mode. It might also get stuck in the handshake stage. (2 of 5 trials got stuck as the following picture shows)
ICMP mode has the same issue.
All these protocols have this problem randomly. I still cannot figure out in what case will this problem be exactly reproducible.
BTW, I am behind NAT provided by my home router and there is no ISP NAT (China Telecom, with public IP address for PPPoE endpoint), and the remote server is an Aliyun Singapore instance with a 1:1 NAT (1 public ip = 1 private ip, every packet is forwarded to the VPS). Might be usable if someone is trying to reproduce this issue.
I experienced the same problem, though less frequently. It randomly occurs on my configuration, and it seems a clean server (haven't been connect by any client yet) are less likely to have the problem than a server (session) that has previously been connected to.
I have done the following experiments on this issue:
1.
Start the server, then start the client. The client gets stuck at the handshake stage. Tried to restart the client several times, but without luck, it cannot get connected to the server.
2.
Start the server, then start the client. The handshake got stuck. I restarted the server and then restart the client, after which both sides handshaked and reached ready state. Tried to restart the client several times and the connections were all successful.
3.
Start the server then start the client. Handshake succeeded. Restarted the server and then the handshake were stuck.
4.
Following the same process of (2), I left the server alone for several minutes, and then tried to reconnect the client: the client got stuck on handshaking.
5.
I left the server and the client alone to 'handshake' for several minutes, then the handshake process was still stuck. They are sending handshake packets to each other but none of them reached the ready state.
hi,@PeterCxy,thx for your feed back.that looks abnormal.
plz:
1.tell me the binary version you used at client/server side
2.and your client/server 's operating system ,32bit or 64bit
3.and the full arguments of your client/server
4.and the frequency of this problem
5.use --log-level 6 --log-position on both side. redirect the log to files(it could be very long,so redirect is necessary). and upload some of typical ones.
great thx.
if you can handle tcpdump,some tcpdump file will also be great.
looks like ,though udp-mode has similiar problem,it goes further than tcp-mode. i saw (re)send handshake2
,which is the second step of udp-mode(client side). but in the tcp-mode screenshot,i only saw (re)send syn,thats only the first step of tcp-mode (client side)
@PeterCxy @BroncoTc I found a serious uninitialized variable problem after checking and experimenting. Which causes packet loss randomly at server side,especially on a 32bit system.
In your case,the problem happens randomly and re-booting the server solves it.This also looks like an uninitialized variable problem,I suspect they are the same one.
Plz have a try with the lastest released version,to check if the problem still exist.
Great thanks!
@wangyu- As far as I can test for now, the problem vanished after upgrading to the 20170813.1 tag. Thanks for your help. I am closing this issue for now.
Hello, I've been trying to use this project to tunnel my OpenVPN connections. However, it appears that udp2raw-tunnel will randomly fail on the handshake stage: either the client or the server (or both) will be stuck at sending SYN packets to each other but never finishing that process. For example,
Without any changes to the network configurations, simply try to restart the server / client several times, and the client will connect without any problem.
At first I believed that this might be some sort of problem of time synchronization, thus I tried to synchronize the time on the server and client and made sure that the current timestamp on the server and client are as close as possible, but the problem remains. The RTT from my server to the client is ~80ms. What is strange is that if the connection is set up for one time, then reconnecting the client will not cause any further failures, but restarting the server might bring the problem back on.
I'm not sure if this is reproducible for others, therefore I am posting it here in case that someone may run into the same problem. I still think this is something about time but I can still not figure out where exactly the problem exists.