pion / turn

Pion TURN, an API for building TURN clients and servers
MIT License
1.83k stars 312 forks source link

Error when handling datagram #206

Closed jamesand closed 2 years ago

jamesand commented 3 years ago

Your environment.

Windows Server 2019 Pion Turn server 2.0.2

What did you do?

I executed "turn-server-simple.exe" to create a Turn server for a WebRTC project Im developing

What did you expect?

To work ok. It does sometimes. Others it get an error.

What happened?

When haviong a videocall using WebRTC, the console throws a lots of errors. For example:

turn ERROR: 2021/03/09 09:47:34 error when handling datagram: failed to handle Refresh-request from 94.73.44.131:30866: no allocation found for 94.73.44.131:30866:0.0.0.0:3478 turn ERROR: 2021/03/09 09:47:34 error when handling datagram: failed to handle Refresh-request from 94.73.44.131:30858: no allocation found for 94.73.44.131:30858:0.0.0.0:3478 turn ERROR: 2021/03/09 09:47:34 error when handling datagram: failed to handle Refresh-request from 94.73.44.131:30860: no allocation found for 94.73.44.131:30860:0.0.0.0:3478 turn ERROR: 2021/03/09 09:52:12 error when handling datagram: failed to create stun message from packet: unexpected EOF: not enough bytes to read header turn ERROR: 2021/03/09 09:52:12 error when handling datagram: failed to create stun message from packet: unexpected EOF: not enough bytes to read header turn ERROR: 2021/03/09 09:52:25 error when handling datagram: failed to create stun message from packet: unexpected EOF: not enough bytes to read header

Others, I get:

turn ERROR: 2021/03/09 12:44:52 error when handling datagram: failed to handle Allocate-request from 176.83.80.192:7363: relay already allocated for 5-TUPLE turn ERROR: 2021/03/09 12:44:52 error when handling datagram: failed to handle Allocate-request from 176.83.80.192:7543: relay already allocated for 5-TUPLE turn ERROR: 2021/03/09 12:44:52 error when handling datagram: failed to handle Allocate-request from 176.83.80.192:7370: relay already allocated for 5-TUPLE turn ERROR: 2021/03/09 12:44:52 error when handling datagram: failed to handle Allocate-request from 176.83.80.192:7381: relay already allocated for 5-TUPLE

Sometimes the videocall works ok althought the errors; sometimes not.

mohammadVatandoost commented 3 years ago

I have the same issue in CentOS Linux 7.

mehrvarz commented 3 years ago

I am also seeing "relay already allocated for 5-TUPLE". It seems to happen only with some client devices (WebRTC web app on Firefox 91 Ubuntu + Android). With other clients (same app on Chromium 91 Ubuntu) this error never shows up. If the error is thrown, it always happens after the client has hit auth handler 3 times in short order. Client may be able to continue despite the err, sometimes with a delay. But plenty of times service is aborted.

I am seeing this more often lately. Maybe it is related to newer Firefox releases? Hah! I just tried with an older FF (v65) on Android and here the error is not thrown.

mehrvarz commented 3 years ago

This log is showing the "already allocated for 5-TUPLE" issue. The client is FF 91 on Android. The server is running on Linux with Go 1.17. Comments below.

server.go:36: received 20 bytes of udp from s.s.s.s:31505 on 0.0.0.0:3739
server.go:61: handleTURNPacket
server.go:67: handleTURNPacket Binding request l=0 attrs=0 id=x0airCDAcjULiKrD 20 Binding
stun.go:9: received BindingRequest from s.s.s.s:31505
stun.go:22: received BindingRequest -> buildAndSend
server.go:36: received 20 bytes of udp from c.c.c.c:47289 on 0.0.0.0:3739
server.go:61: handleTURNPacket
server.go:67: handleTURNPacket Binding request l=0 attrs=0 id=RploSUJXVvEFh58y 20 Binding
stun.go:9: received BindingRequest from c.c.c.c:47289
stun.go:22: received BindingRequest -> buildAndSend
server.go:36: received 44 bytes of udp from c.c.c.c:47289 on 0.0.0.0:3739
server.go:61: handleTURNPacket
server.go:67: handleTURNPacket Allocate request l=24 attrs=3 id=Qf4vu6nmbZcMyIuu 44 Allocate
turn.go:15: received AllocateRequest from c.c.c.c:47289
util.go:85: authenticateRequest call respondWithNonce
turn.go:24: received AllocateRequest -> authenticateRequest !hasAuth
server.go:36: received 20 bytes of udp from c.c.c.c:47329 on 0.0.0.0:3739
server.go:61: handleTURNPacket
server.go:67: handleTURNPacket Binding request l=0 attrs=0 id=h3N9rFd+/+m2nZg+ 20 Binding
stun.go:9: received BindingRequest from c.c.c.c:47329
stun.go:22: received BindingRequest -> buildAndSend
server.go:36: received 44 bytes of udp from c.c.c.c:47329 on 0.0.0.0:3739
server.go:61: handleTURNPacket
server.go:67: handleTURNPacket Allocate request l=24 attrs=3 id=iMVa7J13O6r18InB 44 Allocate
turn.go:15: received AllocateRequest from c.c.c.c:47329
util.go:85: authenticateRequest call respondWithNonce
turn.go:24: received AllocateRequest -> authenticateRequest !hasAuth
server.go:36: received 44 bytes of udp from c.c.c.c:47289 on 0.0.0.0:3739
server.go:61: handleTURNPacket
server.go:67: handleTURNPacket Allocate request l=24 attrs=3 id=Qf4vu6nmbZcMyIuu 44 Allocate
turn.go:15: received AllocateRequest from c.c.c.c:47289
util.go:85: authenticateRequest call respondWithNonce
turn.go:24: received AllocateRequest -> authenticateRequest !hasAuth
server.go:36: received 20 bytes of udp from c.c.c.c:47329 on 0.0.0.0:3739
server.go:61: handleTURNPacket
server.go:67: handleTURNPacket Binding request l=0 attrs=0 id=h3N9rFd+/+m2nZg+ 20 Binding
stun.go:9: received BindingRequest from c.c.c.c:47329
stun.go:22: received BindingRequest -> buildAndSend
server.go:36: received 140 bytes of udp from c.c.c.c:47289 on 0.0.0.0:3739
server.go:61: handleTURNPacket
server.go:67: handleTURNPacket Allocate request l=120 attrs=7 id=Sr/cN8gRXBBMakWo 140 Allocate
turn.go:15: received AllocateRequest from c.c.c.c:47289
util.go:88: authenticateRequest not call respondWithNonce

turn auth for client (c.c.c.c) SUCCESS

turn.go:27: received AllocateRequest -> authenticateRequest hasAuth
turn.go:40: fiveTuple &{0 c.c.c.c:47289 0.0.0.0:3739}
allocation_manager.go:103: listening on relay addr: s.s.s.s:37937
server.go:36: received 140 bytes of udp from c.c.c.c:47329 on 0.0.0.0:3739
server.go:61: handleTURNPacket
server.go:67: handleTURNPacket Allocate request l=120 attrs=7 id=3AG+CTVPZEao85A9 140 Allocate
turn.go:15: received AllocateRequest from c.c.c.c:47329
util.go:88: authenticateRequest not call respondWithNonce

turn auth for client (c.c.c.c) SUCCESS

turn.go:27: received AllocateRequest -> authenticateRequest hasAuth
turn.go:40: fiveTuple &{0 c.c.c.c:47329 0.0.0.0:3739}
allocation_manager.go:103: listening on relay addr: s.s.s.s:21344
server.go:36: received 140 bytes of udp from c.c.c.c:47289 on 0.0.0.0:3739
server.go:61: handleTURNPacket
server.go:67: handleTURNPacket Allocate request l=120 attrs=7 id=Sr/cN8gRXBBMakWo 140 Allocate
turn.go:15: received AllocateRequest from c.c.c.c:47289
util.go:88: authenticateRequest not call respondWithNonce

turn auth for client (c.c.c.c) SUCCESS

turn.go:27: received AllocateRequest -> authenticateRequest hasAuth
turn.go:40: fiveTuple &{0 c.c.c.c:47289 0.0.0.0:3739}
turn ERROR: 2021/08/25 10:58:25 error when handling datagram: failed to handle Allocate-request from c.c.c.c:47289: relay already allocated for 5-TUPLE

comments:

There are three such "fiveTuple" lines before the error (why?). The 1st and the 3rd do indeed show the same ip's and ports:

turn.go:40: fiveTuple &{0 c.c.c.c:47289 0.0.0.0:3739} turn.go:40: fiveTuple &{0 c.c.c.c:47329 0.0.0.0:3739} turn.go:40: fiveTuple &{0 c.c.c.c:47289 0.0.0.0:3739}

mehrvarz commented 3 years ago

For comparsion, here a log without the 5-TUPLE issue. The client in this case is Chromium 91 on Ubuntu. There are again 3 calls to AuthHandler (is this expected?). But unlike in the FF 91 case above, you can see CreatePermission being called. There is also less activity before the first call to AuthHandler than in the previous log.

server.go:36: received 20 bytes of udp from s.s.s.s:45996 on 0.0.0.0:3739
server.go:61: handleTURNPacket
server.go:67: handleTURNPacket Binding request l=0 attrs=0 id=TCAbc2bgBWe7YpE3 20 Binding
stun.go:9: received BindingRequest from s.s.s.s:45996
stun.go:22: received BindingRequest -> buildAndSend
server.go:36: received 20 bytes of udp from c.c.c.c:47282 on 0.0.0.0:3739
server.go:61: handleTURNPacket
server.go:67: handleTURNPacket Binding request l=0 attrs=0 id=b2ZzRlBTQ1VyZEk0 20 Binding
stun.go:9: received BindingRequest from c.c.c.c:47282
stun.go:22: received BindingRequest -> buildAndSend
server.go:36: received 28 bytes of udp from c.c.c.c:47282 on 0.0.0.0:3739
server.go:61: handleTURNPacket
server.go:67: handleTURNPacket Allocate request l=8 attrs=1 id=RURhdmRRRVo5MWNX 28 Allocate
turn.go:15: received AllocateRequest from c.c.c.c:47282
util.go:85: authenticateRequest call respondWithNonce
turn.go:24: received AllocateRequest -> authenticateRequest !hasAuth
server.go:36: received 124 bytes of udp from c.c.c.c:47282 on 0.0.0.0:3739
server.go:61: handleTURNPacket
server.go:67: handleTURNPacket Allocate request l=104 attrs=5 id=ZzI5UHhuU01VYTdP 124 Allocate
turn.go:15: received AllocateRequest from c.c.c.c:47282
util.go:88: authenticateRequest not call respondWithNonce

turn auth for client (c.c.c.c) SUCCESS

turn.go:27: received AllocateRequest -> authenticateRequest hasAuth
turn.go:40: fiveTuple &{0 c.c.c.c:47282 0.0.0.0:3739}
allocation_manager.go:103: listening on relay addr: s.s.s.s:48310
server.go:36: received 128 bytes of udp from c.c.c.c:47282 on 0.0.0.0:3739
server.go:61: handleTURNPacket
server.go:67: handleTURNPacket CreatePermission request l=108 attrs=5 id=OUZFQWJUd3JtdDdJ 128 CreatePermission
turn.go:208: received CreatePermission from c.c.c.c:47282
util.go:88: authenticateRequest not call respondWithNonce

turn auth for client (c.c.c.c) SUCCESS

turn.go:232: adding permission for s.s.s.s:29120
server.go:36: received 128 bytes of udp from c.c.c.c:47282 on 0.0.0.0:3739
server.go:61: handleTURNPacket
server.go:67: handleTURNPacket CreatePermission request l=108 attrs=5 id=eDFSMjN1V2RHdkdX 128 CreatePermission
turn.go:208: received CreatePermission from c.c.c.c:47282
util.go:88: authenticateRequest not call respondWithNonce

turn auth for client (c.c.c.c) SUCCESS

turn.go:232: adding permission for s.s.s.s:45996
mehrvarz commented 3 years ago

Hey @Sean-Der

If you want to ask more questions I am in Slack and always ready to help

Sorry, I am having difficulties getting on to Slack. In any case, there seems to be an issue with Pion Turn and newer Firefox releases. Would you be able to take a look at my logs and share a thought? If you can help me a bit I may be able to fix this and provide a patch. Using FF with Coturn does not appear to throw any errors. Nor does using Chromium with Pion Turn. But when using FF 91 (on Ubuntu and Android) with Pion Turn, I always see the "already allocated for 5-TUPLE" error. And in some cases the WebRTC service can then not be used or there is a seconds long delay (however, sometimes service does work despite the error msg). Crucial question is whether FF or Pion Turn needs fixing. Thank you.

mehrvarz commented 3 years ago

Think I found a solution. It involves swapping step 1 and 2 in the handleAllocateRequest() function of internal/server/turn.go. In other words, I check for 5-tuple before invoking authenticateRequest(). Please let me know if doing this is wrong for obvious reasons. This is the code I am using. It seems to work well with both browsers, including on Android. FF 91 still does this, but now it appears to be harmless and all sessions work without delay or abort.

cnderrauber commented 2 years ago

As RFC said, The server MUST require that the request be authenticated first. So there might be another reason for this.

https://datatracker.ietf.org/doc/html/rfc5766#section-6.2

6.2. Receiving an Allocate Request

When the server receives an Allocate request, it performs the following checks:

  1. The server MUST require that the request be authenticated. This authentication MUST be done using the long-term credential mechanism of [RFC5389] unless the client and server agree to use another mechanism through some procedure outside the scope of this document.

  2. The server checks if the 5-tuple is currently in use by an existing allocation. If yes, the server rejects the request with a 437 (Allocation Mismatch) error.

mehrvarz commented 2 years ago

As RFC said, The server MUST require that the request be authenticated first. So there might be another reason for this.

No, the word "MUST" relates to "be authenticated". It does not relate to the order of steps. Five things need to be done. RFC does not explicitly say that a specific order is required.

Swapping step 1 and 2 seems to work well for me in practice and for 6 month now. Checking "is allocations in use" before doing "authenticateRequest()" also make full sense to me. With the modified order, no authentication takes place if the allocation is already in use.

The original order was causing issues with some UA's. Not with all. The modified order works with all UA's. Can you please do a code analysis?

cnderrauber commented 2 years ago

As RFC said, The server MUST require that the request be authenticated first. So there might be another reason for this.

No, the word "MUST" relates to "be authenticated". It does not relate to the order of steps. Five things need to be done. RFC does not explicitly say that a specific order is required.

Swapping step 1 and 2 seems to work well for me in practice and for 6 month now. Checking "is allocations in use" before doing "authenticateRequest()" also make full sense to me. With the modified order, no authentication takes place if the allocation is already in use.

The original order was causing issues with some UA's. Not with all. The modified order works with all UA's. Can you please do a code analysis?

We should follow the RFC's step to check these. I don't know the detail of your problem, but I can reproduce the already allocated case. That's because turn server's response sent to client may lost, and the behavior of client's retry and pion process the retry request cause the problem.

  1. client sent allocate request
  2. server received request, process it correctly and response to client
  3. response lost, and client retry and got 437 allocation mismatch result because server found 5 tuple already exist a. some client retry with same 5 tuple, firefox does this, it use resiprocate's nICEr lib, and I checked the code b. some client like chrome retry allocation with other transport when received 437 , so it success.

client's behavior like chrome is follow RFC, and nICEr not implement this.

The client SHOULD try three different client transport addresses before giving up on this server

RFC also has mentioned server could implement 'stateless stack approach' to process allocation retransmit, but pion not implement this.

So the '5 tuple already exist' problem is caused by both ff and pion. I can reproduce it if I always drop first allocation response, and ff can't establish connection. Then I add the retransmit process, the problem is gone, ff works fine even allocation response is lost/dropped. I will create a PR tomorrow for this

mehrvarz commented 2 years ago

Wow. This is a fairly extensive patch. When do you think will a release tag become available?

cnderrauber commented 2 years ago

@mehrvarz Have another test case for packet lost, if all things working, will release today

mehrvarz commented 2 years ago

I am now testing v2.0.8. As before, I am seeing "No Permission or Channel exists..." log entries. About 4-6 per client connect, even for pure p2p (none-TURN) connections. These msgs come from /internal/allocation/allocation.go, very close to code you just looked at. Did you not notice them fill the log?

Note: If something is wrong (say, turn auth fail), I see more of these msgs. But if everything is working right, I still get about 4-6 of these msgs.

mehrvarz commented 2 years ago

@cnderrauber I am not asking you to solve a problem. I am only asking if you have seen unexpected "No Permission or Channel exists..." log msgs.

cnderrauber commented 2 years ago

I think they are exist but not sure.

mehrvarz commented 2 years ago

Findings:

Just let me mute this log statement.

Ed: I previously reported this issue. It sometimes goes away. But it keeps coming back.

hy-net commented 2 years ago

I setup a new pion server, put up for a few days (not even using it yet), and this error appears from no where.

Seems like someone is sending random datagram to my server.

dadagov125 commented 2 years ago

I have the same problem when connecting from ios 15.0