shinyoshiaki / werift-webrtc

WebRTC Implementation for TypeScript (Node.js), includes ICE/DTLS/SCTP/RTP/SRTP/WEBM/MP4
MIT License
482 stars 32 forks source link

Error set remote description when use stun #304

Closed kolserdav closed 1 year ago

kolserdav commented 1 year ago

When working through a stun, a critical error often occurs that prevents a connection from being established:

 error  Error set remote description  {
  e: "Cannot read properties of undefined (reading 'length')",
  stack: "TypeError: Cannot read properties of undefined (reading 'length')\n" +
    '    at Message.get bytes [as bytes] (/usr/local/share/applications/werift-sfu-react/node_modules/werift/lib/ice/src/stun/message.js:78:21)\n' +
    '    at Message.messageIntegrity (/usr/local/share/applications/werift-sfu-react/node_modules/werift/lib/ice/src/stun/message.js:99:46)\n' +
    '    at Message.addMessageIntegrity (/usr/local/share/applications/werift-sfu-react/node_modules/werift/lib/ice/src/stun/message.js:95:53)\n' +
    '    at TurnClient.request (/usr/local/share/applications/werift-sfu-react/node_modules/werift/lib/ice/src/turn/protocol.js:181:18)\n' +
    '    at TurnClient.connect (/usr/local/share/applications/werift-sfu-react/node_modules/werift/lib/ice/src/turn/protocol.js:154:39)\n' +
    '    at async createTurnEndpoint (/usr/local/share/applications/werift-sfu-react/node_modules/werift/lib/ice/src/turn/protocol.js:232:5)\n' +
    '    at async Connection.getComponentCandidates (/usr/local/share/applications/werift-sfu-react/node_modules/werift/lib/ice/src/ice.js:286:30)\n' +
    '    at async Connection.gatherCandidates (/usr/local/share/applications/werift-sfu-react/node_modules/werift/lib/ice/src/ice.js:234:36)\n' +
    '    at async RTCIceGatherer.gather (/usr/local/share/applications/werift-sfu-react/node_modules/werift/lib/webrtc/src/transport/ice.js:114:13)\n' +
    '    at async Promise.all (index 1)',
  is: 'new',
  cs: 'new',
  ss: 'stable'
} 

The top line of the stack refers to this line: https://github.com/shinyoshiaki/werift-webrtc/blob/f916e893ac895945ab899edf24c88481634a1e53/packages/ice/src/stun/message.ts#L120

I figured out that undefined is the value of the NONCE attribute. While at other times it is a normal numerical value.

I have a hunch that getAttributeValue is being called there with this.attributes already changed, but I don't have enough experience with werift yet to check that.

kolserdav commented 1 year ago

Actually the error is really related to this fix:

[0] [1]  error  Error set remote description  {
[0] [1]   e: 'The first argument must be of type string or an instance of Buffer, ArrayBuffer, or Array or an Array-like Object. Received undefined',
[0] [1]   stack: 'TypeError [ERR_INVALID_ARG_TYPE]: The first argument must be of type string or an instance of Buffer, ArrayBuffer, or Array or an Array-like Object. Received undefined\n' +
[0] [1]     '    at new NodeError (node:internal/errors:372:5)\n' +
[0] [1]     '    at Function.from (node:buffer:323:9)\n' +
[0] [1]     '    at packString (/home/user/Projects/werift-sfu-react/node_modules/werift/lib/ice/src/stun/attributes.js:111:38)\n' +
[0] [1]     '    at Message.get bytes [as bytes] (/home/user/Projects/werift-sfu-react/node_modules/werift/lib/ice/src/stun/message.js:77:19)\n' +
[0] [1]     '    at Message.messageIntegrity (/home/user/Projects/werift-sfu-react/node_modules/werift/lib/ice/src/stun/message.js:99:46)\n' +
[0] [1]     '    at Message.addMessageIntegrity (/home/user/Projects/werift-sfu-react/node_modules/werift/lib/ice/src/stun/message.js:95:53)\n' +
[0] [1]     '    at TurnClient.request (/home/user/Projects/werift-sfu-react/node_modules/werift/lib/ice/src/turn/protocol.js:181:18)\n' +
[0] [1]     '    at TurnClient.connect (/home/user/Projects/werift-sfu-react/node_modules/werift/lib/ice/src/turn/protocol.js:154:39)\n' +
[0] [1]     '    at async createTurnEndpoint (/home/user/Projects/werift-sfu-react/node_modules/werift/lib/ice/src/turn/protocol.js:232:5)\n' +
[0] [1]     '    at async Connection.getComponentCandidates (/home/user/Projects/werift-sfu-react/node_modules/werift/lib/ice/src/ice.js:284:30)',
[0] [1] } 
kolserdav commented 1 year ago

Actually the error is really related to this fix:

[0] [1]  error  Error set remote description  {
[0] [1]   e: 'The first argument must be of type string or an instance of Buffer, ArrayBuffer, or Array or an Array-like Object. Received undefined',
[0] [1]   stack: 'TypeError [ERR_INVALID_ARG_TYPE]: The first argument must be of type string or an instance of Buffer, ArrayBuffer, or Array or an Array-like Object. Received undefined\n' +
[0] [1]     '    at new NodeError (node:internal/errors:372:5)\n' +
[0] [1]     '    at Function.from (node:buffer:323:9)\n' +
[0] [1]     '    at packString (/home/user/Projects/werift-sfu-react/node_modules/werift/lib/ice/src/stun/attributes.js:111:38)\n' +
[0] [1]     '    at Message.get bytes [as bytes] (/home/user/Projects/werift-sfu-react/node_modules/werift/lib/ice/src/stun/message.js:77:19)\n' +
[0] [1]     '    at Message.messageIntegrity (/home/user/Projects/werift-sfu-react/node_modules/werift/lib/ice/src/stun/message.js:99:46)\n' +
[0] [1]     '    at Message.addMessageIntegrity (/home/user/Projects/werift-sfu-react/node_modules/werift/lib/ice/src/stun/message.js:95:53)\n' +
[0] [1]     '    at TurnClient.request (/home/user/Projects/werift-sfu-react/node_modules/werift/lib/ice/src/turn/protocol.js:181:18)\n' +
[0] [1]     '    at TurnClient.connect (/home/user/Projects/werift-sfu-react/node_modules/werift/lib/ice/src/turn/protocol.js:154:39)\n' +
[0] [1]     '    at async createTurnEndpoint (/home/user/Projects/werift-sfu-react/node_modules/werift/lib/ice/src/turn/protocol.js:232:5)\n' +
[0] [1]     '    at async Connection.getComponentCandidates (/home/user/Projects/werift-sfu-react/node_modules/werift/lib/ice/src/ice.js:284:30)',
[0] [1] } 

That is, if you fix it as in the PR, then the error described above appears

kolserdav commented 1 year ago

It seems that something is starting to clear up, I output it to the console from this file lib/ice/src/stun/attributes.js:

getAttributeValue(key) {
        const attribute = this.attributes.find((a) => a[0] === key);
        if (!attribute) {

         console.log(key, this.attributes);
            return undefined;
        }
        return attribute[1];
    }

And I see that an exception is thrown due to an error in the message:

DATA [
  [
    'ERROR-CODE',
    [ 437, 'Mismatched allocation: wrong transaction ID\x00' ]
  ],
  [ 'SOFTWARE', "Coturn-4.5.2 'dan Eider'" ],
  [ 'FINGERPRINT', 4171264430 ]
]
NONCE [
  [
    'ERROR-CODE',
    [ 437, 'Mismatched allocation: wrong transaction ID\x00' ]
  ],
  [ 'SOFTWARE', "Coturn-4.5.2 'dan Eider'" ],
  [ 'FINGERPRINT', 4171264430 ]
]
REALM [
  [
    'ERROR-CODE',
    [ 437, 'Mismatched allocation: wrong transaction ID\x00' ]
  ],
  [ 'SOFTWARE', "Coturn-4.5.2 'dan Eider'" ],
  [ 'FINGERPRINT', 4171264430 ]
]
  error  Error set remote description  {
  e: 'The first argument must be of type string or an instance of Buffer, ArrayBuffer, or Array or an Array-like Object. Received undefined',
  stack: 'TypeError [ERR_INVALID_ARG_TYPE]: The first argument must be of type string or an instance of Buffer, ArrayBuffer, or Array or an Array-like Object. Received undefined\n' +
    '    at new NodeError (node:internal/errors:387:5)\n' +
    '    at Function.from (node:buffer:328:9)\n' +
    '    at packString (/usr/local/share/applications/werift-sfu-react/node_modules/werift/lib/ice/src/stun/attributes.js:111:38)\n' +
    '    at Message.get bytes [as bytes] (/usr/local/share/applications/werift-sfu-react/node_modules/werift/lib/ice/src/stun/message.js:77:19)\n' +
    '    at Message.messageIntegrity (/usr/local/share/applications/werift-sfu-react/node_modules/werift/lib/ice/src/stun/message.js:99:46)\n' +
    '    at Message.addMessageIntegrity (/usr/local/share/applications/werift-sfu-react/node_modules/werift/lib/ice/src/stun/message.js:95:53)\n' +
    '    at TurnClient.request (/usr/local/share/applications/werift-sfu-react/node_modules/werift/lib/ice/src/turn/protocol.js:181:18)\n' +
    '    at TurnClient.connect (/usr/local/share/applications/werift-sfu-react/node_modules/werift/lib/ice/src/turn/protocol.js:154:39)\n' +
    '    at processTicksAndRejections (node:internal/process/task_queues:96:5)\n' +
    '    at async createTurnEndpoint (/usr/local/share/applications/werift-sfu-react/node_modules/werift/lib/ice/src/turn/protocol.js:232:5)',
  roomId: '1667350895708',
  userId: '1',
  target: 0,
  connId: 'b0aa26d5-9f5e-493c-a5e5-257bf7b2b6a1',
  peerId: '1_0_b0aa26d5-9f5e-493c-a5e5-257bf7b2b6a1',
  peers: undefined,
  is: 'new',
  cs: 'new',
  ss: 'stable'
}  

  warn  Skipping set local description for answer  {
  error: true,
  roomId: '1667350895708',
  userId: '1',
  target: 0,
  connId: 'b0aa26d5-9f5e-493c-a5e5-257bf7b2b6a1',
  peerId: '1_0_b0aa26d5-9f5e-493c-a5e5-257bf7b2b6a1',
  peers: undefined,
  is: 'new',
  cs: 'new',
  ss: 'stable',
  sendersLength: 2
}  

DATA [
  [
    'ERROR-CODE',
    [ 437, 'Mismatched allocation: wrong transaction ID\x00' ]
  ],
  [ 'SOFTWARE', "Coturn-4.5.2 'dan Eider'" ],
  [ 'FINGERPRINT', 1518249910 ]
]
NONCE [
  [
    'ERROR-CODE',
    [ 437, 'Mismatched allocation: wrong transaction ID\x00' ]
  ],
  [ 'SOFTWARE', "Coturn-4.5.2 'dan Eider'" ],
  [ 'FINGERPRINT', 1518249910 ]
]
REALM [
  [
    'ERROR-CODE',
    [ 437, 'Mismatched allocation: wrong transaction ID\x00' ]
  ],
  [ 'SOFTWARE', "Coturn-4.5.2 'dan Eider'" ],
  [ 'FINGERPRINT', 1518249910 ]
]

Most likely, you need to process this error, as well as find out and, if possible, eliminate its cause. @shinyoshiaki Do you have any thoughts on this?

kolserdav commented 1 year ago

I checked this issue https://github.com/coturn/coturn/issues/267 , it did not fit. Since my error happens even when the firewall is turned off.

kolserdav commented 1 year ago

Here's what I was able to find out. First, when everything is working fine, i.e. when there is no 437 error and the connections work as expected, then getAttributeValue also each connection can't find a value for the DATA field in some messages:

DATA [
  [ 'ERROR-CODE', [ 401, 'Unauthorized' ] ],
  [ 'NONCE', <Buffer 39 32 66 35 37 35 62 37 61 38 64 35 30 39 39 62> ],
  [ 'REALM', 'uyem.ru' ],
  [ 'SOFTWARE', "Coturn-4.5.2 'dan Eider'" ],
  [ 'FINGERPRINT', 3705723979 ]
] 92f575b7a8d5099b
DATA [
  [ 'ERROR-CODE', [ 401, 'Unauthorized' ] ],
  [ 'NONCE', <Buffer 39 66 32 37 37 62 65 31 62 31 64 31 37 32 34 35> ],
  [ 'REALM', 'uyem.ru' ],
  [ 'SOFTWARE', "Coturn-4.5.2 'dan Eider'" ],
  [ 'FINGERPRINT', 3961840218 ]
] 9f277be1b1d17245
DATA [
  [ 'XOR-RELAYED-ADDRESS', [ '176.124.206.161', 57030 ] ],
  [ 'XOR-MAPPED-ADDRESS', [ '127.0.0.1', 20007 ] ],
  [ 'LIFETIME', 600 ],
  [ 'SOFTWARE', "Coturn-4.5.2 'dan Eider'" ],
  [
    'MESSAGE-INTEGRITY',
    <Buffer e2 44 b8 7a c2 b7 99 58 4c 90 5b 8c bc d7 7f 63 d6 d4 63 d8>
  ],
  [ 'FINGERPRINT', 3672353448 ]
] 127.0.0.1,20007
DATA [
  [ 'XOR-RELAYED-ADDRESS', [ '176.124.206.161', 54211 ] ],
  [ 'XOR-MAPPED-ADDRESS', [ '127.0.0.1', 20008 ] ],
  [ 'LIFETIME', 600 ],
  [ 'SOFTWARE', "Coturn-4.5.2 'dan Eider'" ],
  [
    'MESSAGE-INTEGRITY',
    <Buffer 67 7b b2 33 4d 13 44 fd 1e f4 38 fb e1 91 81 75 99 21 93 9d>
  ],
  [ 'FINGERPRINT', 729170023 ]
] 127.0.0.1,20008
DATA [
  [ 'SOFTWARE', "Coturn-4.5.2 'dan Eider'" ],
  [
    'MESSAGE-INTEGRITY',
    <Buffer 46 9a 8d 7b 6f 89 0f eb 03 05 b5 92 b3 f9 d3 a3 bb c2 9e e8>
  ],
  [ 'FINGERPRINT', 3316151783 ]
] F��{o������ӣ�'SOFTWARE', "Coturn-4.5.2 'dan Eider'" ],
  [
    'MESSAGE-INTEGRITY',
    <Buffer 85 c3 3b e4 bd 62 44 19 13 9d 7d 83 6c c7 a1 5a c3 31 cf af>
  ],
  [ 'FINGERPRINT', 3522850246 ]
] ��;�bD�}�lǡZ�1ϯ

It doesn't bother me too much, since the connections still work fine. But I'm sharing it here anyway just in case this behavior could be indicative of werift being misused.

The main thing I noticed today is that after rebooting coturn everything works fine. But as soon as I restart my app and run the tests, I get a 437 error every time until I restart coturn again. On a freshly restarted coturn everything works fine until I restart my application again.

I will appreciate any of your help.

kolserdav commented 1 year ago

The main thing I noticed today is that after rebooting coturn everything works fine. But as soon as I restart my app and run the tests, I get a 437 error every time until I restart coturn again. On a freshly restarted coturn everything works fine until I restart my application again.

The reason for this behavior seems to be that coturn somehow stores the state of ICE candidates without clearing them after disconnecting from the application. I tried to test without using the icePortRange parameter, and I was lucky once I got an error (apparently one of the ICE ports from the previous application launch matched), but all other launches without icePortRange did not show any errors. In contrast to the options for using the icePortRange parameter, when an error occurs on almost every connection on the second and subsequent launches of the application.

Based on logic, I want to assume that the problem can only be in the work of the coturn. @shinyoshiaki , if you have nothing to add on this issue, then you can close it.

I don’t have a simple example of how to reproduce the problem, I can only make a description of the launch of my application on which I observe this problem https://github.com/kolserdav/werift-sfu-react , but there you will need to configure the connection to the database, I think that you have there will be no time for this action. However, if your configuration is something like this https://github.com/kolserdav/werift-sfu-react/blob/master/docs/resources/coturn.conf , then you can try to pass icePortRange on your personal example, and when running tests create a condition so that after restarting your application and when running tests, ICE ports are allocated in the same way as during the previous application launch.

kolserdav commented 1 year ago

Based on this answer https://github.com/coturn/coturn/issues/1138#issuecomment-1375098904 . It can be concluded that error 437 can be thrown by the turn server within 600 seconds after restarting applications running on werift, in particular, at those moments when a non-free ICE port is allocated. This will be especially noticeable when using icePortRange. It seems to me that this moment should be taken into account exactly inside werift, when a message with code 437 arrives, so that a new connection is re-created, but with other ICE ports (not sure if I put it right here). Otherwise, if it doesn't, then when we update our applications and restart them as needed, coturn will also need to be restarted, but this is not possible if multiple applications are running with the turn server. Otherwise, the application may malfunction for up to 600 seconds after a reboot.

Codefa commented 1 year ago

@kolserdav i have the same issue with werift, for now my temp fix is got to /ice/src/turn/protocol.ts on L:201 inside refresh function you can find the while loop to refresh the turn client before nonce expire ( 600 ) I simply commented it out for now, works perfect now no more 438 stale-nonce errors from coturn side

// comment this while loop
while (run) {
        // refresh before expire
        await setTimeout((5 / 6) * this.lifetime * 1000);

        const request = new Message(methods.REFRESH, classes.REQUEST);
        request.setAttribute("LIFETIME", this.lifetime);

        await this.request(request, this.server);
      }
Codefa commented 1 year ago

@kolserdav UPDATE on the issue from 0.17.7 version we don't need turn server in server side of SFU, only client need stun & turn servers if you want to use your own stun server on SFU server side you can, default it use google one's I tried to avoid using turn on SFU server on the previous versions but it works for me after upgrading to 0.17.7

so if don't use our own coturn turn on SFU server side, no more issues.

kolserdav commented 1 year ago

@kolserdav UPDATE on the issue from 0.17.7 version we don't need turn server in server side of SFU, only client need stun & turn servers if you want to use your own stun server on SFU server side you can, default it use google one's I tried to avoid using turn on SFU server on the previous versions but it works for me after upgrading to 0.17.7

so if don't use our own coturn turn on SFU server side, no more issues.

@Codefa you are right, thank you.