twilio / twilio-voice.js

Twilio's JavaScript Voice SDK
Other
53 stars 53 forks source link

[BUG] Randomly inconsistent Call SIDs for incoming WebSocket events #139

Closed kamilbonna closed 1 year ago

kamilbonna commented 1 year ago

Code to reproduce the issue:

Hi,

I'm experiencing an issue with twilio/voice-sdk. I'm testing outbound calls from a web application. The test call is supposed to return ConnectionError (31005). During the setup phase I'm creating a new call object and setting appropriate handlers:

const call = await device.connect({...})
call.on('accept', handleCallAccept)
call.on('disconnect', handleCallDisconnect)

I figured out that behind the scenes the WebSocket connection is opened and emits one invite event and then receives three events:

{"payload":{"callsid":"CA..."},"type":"ringing","version":""}
{"payload":{"callsid":"CA...","error":{"code":31000,"message":"General Error"}},"type":"hangup","version":""}
{"payload":{"callsid":"CA..."},"type":"hangup","version":""}

These hangup events then trigger error and disconnect callbacks on the call object. This is the expected behavior.

However, once every 20-30 calls (it happens randomly) the callsid for ringing event is different than the callsid for hangup events:

{"payload":{"callsid":"TJ..."},"type":"ringing","version":""}
{"payload":{"callsid":"CA...","error":{"code":31000,"message":"General Error"}},"type":"hangup","version":""}
{"payload":{"callsid":"CA..."},"type":"hangup","version":""}

In that case these events are not recognized as belonging to the initial call object and callbacks are not fired. This ultimately cause the application UI to not reflect actual call state.

Expected behavior:

Actual behavior:

Software versions:

charliesantos commented 1 year ago

Hi @kamilbonna , can you please provide some callsids that are different but on the same call?

kamilbonna commented 1 year ago

@charliesantos Thanks for quick reply. Do they have to be exact or can they be truncated? I'm just wondering if it's safe to share them publicly here. In my description I redacted SIDs leaving only 2 first characters as they were distinct between different WebSocket events.

charliesantos commented 1 year ago

Hi @kamilbonna , they need to be exact so we can look them up internally. It's safe to share here. If you're not comfortable, please submit a support ticket to Twilio. Also, can you please enable debug mode and provide client logs? Are you able to see this on different OS and Browsers?

kamilbonna commented 1 year ago

@charliesantos I was able to reproduce the issue on Firefox. Here are WebSocket events for a faulty call:

{"type":"invite","version":"1.6","payload":{"callsid":"TJSfa2a97ac-6f98-4e8a-b433-717350381382","sdp":"<...>","preflight":false,"twilio":{"params":"callerId=%2B48<...>"}}}
{"payload":{"callsid":"TJSfa2a97ac-6f98-4e8a-b433-717350381382"},"type":"ringing","version":""}
{"payload":{"callsid":"CAb4f8fe788e9ce21dd5240e0a0cc03d78","error":{"code":31000,"message":"General Error"}},"type":"hangup","version":""}
{"payload":{"callsid":"CAb4f8fe788e9ce21dd5240e0a0cc03d78"},"type":"hangup","version":""}

and corresponding events for one of the working calls:

{"type":"invite","version":"1.6","payload":{"callsid":"TJS7dad1376-15f7-410f-8a7c-dfce62bbc760","sdp":"<...>","preflight":false,"twilio":{"params":"callerId=%2B48<...>"}}}
{"payload":{"callsid":"CA83157d64636a7682cfcdde2970426e15"},"type":"ringing","version":""}
{"payload":{"callsid":"CA83157d64636a7682cfcdde2970426e15","error":{"code":31000,"message":"General Error"}},"type":"hangup","version":""}
{"payload":{"callsid":"CA83157d64636a7682cfcdde2970426e15"},"type":"hangup","version":""}

I also captured client logs for a faulty call:

Setting up VSP
Initializing preferred transport backoff using config:  {factor: 2, maxDelay: 1000, randomisationFactor: 0.4}
Initializing primary transport backoff using config:  {factor: 2, initialDelay: 100, maxDelay: 20000, randomisationFactor: 0.4}
WSTransport.open() called..
Attempting to connect...
Closing and cleaning up WebSocket...
No WebSocket to clean up.
WebSocket opened successfully.
Setting token and publishing listen
signalingState is "have-local-offer"
dtlsTransportState is "new"
pc.iceGatheringState is "gathering"
ICE Candidate: {"candidate":"candidate:3003816638 1 udp 2122260223 172.18.0.1 34520 typ host generation 0 ufrag jD6E network-id 1","sdpMid":"0","sdpMLineIndex":0}
ICE Candidate: {"candidate":"candidate:3047211145 1 udp 2122194687 100.121.101.33 47695 typ host generation 0 ufrag jD6E network-id 2","sdpMid":"0","sdpMLineIndex":0}
ICE Candidate: {"candidate":"candidate:3236682033 1 udp 2122129151 192.168.0.105 40752 typ host generation 0 ufrag jD6E network-id 3 network-cost 10","sdpMid":"0","sdpMLineIndex":0}
ICE Candidate: {"candidate":"candidate:1302351402 1 tcp 1518280447 172.18.0.1 9 typ host tcptype active generation 0 ufrag jD6E network-id 1","sdpMid":"0","sdpMLineIndex":0}
ICE Candidate: {"candidate":"candidate:1258951709 1 tcp 1518214911 100.121.101.33 9 typ host tcptype active generation 0 ufrag jD6E network-id 2","sdpMid":"0","sdpMLineIndex":0}
ICE Candidate: {"candidate":"candidate:1044450725 1 tcp 1518149375 192.168.0.105 9 typ host tcptype active generation 0 ufrag jD6E network-id 3 network-cost 10","sdpMid":"0","sdpMLineIndex":0}
pc.iceGatheringState is "complete"
ICE Candidate: null

For working call ICE Candidate: null log was followed by Received HANGUP from gateway and appropriate error:

Received an error from the gateway: ConnectionError: ConnectionError (31005): Error sent from gateway in HANGUP
    at ConnectionError.TwilioError [as constructor] (twilioError.js?4ce7:25:1)
    at new ConnectionError (generated.js?bd0e:159:1)
    at Call._this._onHangup (call.js?b8a5:299:1)
    at PStream.emit (events.js?5a73:153:1)
    at PStream._handleTransportMessage (pstream.js?1108:179:1)
    at WSTransport.emit (events.js?5a73:153:1)
    at WSTransport._this._onSocketMessage (wstransport.js?f1fb:173:1)

which was not the case for a faulty call.

Just to be clear. I consider call to be "faulty" when an expected error is not triggering error and disconnect callbacks and "working" when callbacks are fired.

charliesantos commented 1 year ago

Hi @kamilbonna, apologies for the late reply. The logs and example websocket events you provided really helps. I'm checking with our signaling team to get more information regarding this behavior. In the meantime, are you still seeing this issue?

charliesantos commented 1 year ago

Hi @kamilbonna , can you also provide example callsids for faulty calls in the last 7 days?

kamilbonna commented 1 year ago

Hi @charliesantos, I think we are not seeing that issue anymore because it was related to specific settings that we used only during feature development and testing. This means that I don't have error logs for faulty calls in the last 7 days. If old logs are not enough, please let me know so I could try to revert these settings and reproduce the issue when I have some extra time.

charliesantos commented 1 year ago

@kamilbonna I'm glad you're not able to reproduce this with your new settings. What were the differences in your settings that caused the issue?

Feel free to submit new logs and callsids when you get a chance. But in the meantime, I will close this issue since you got it working now. We can reopen it once we get a repro and new logs/callsids. I'm still interested in what you've changed in your settings though :)