jvdrhoof / WebRTCSFU

0 stars 0 forks source link

We need better error messages, and flushing stdout per line #1

Closed jackjansen closed 2 months ago

jackjansen commented 4 months ago

At the moment, when I'm running the SFU under the orchestrator, and when I'm trying to debug a connection problem the output isn't at its most useful:-)

Here is an example:

web-1  | [2024-04-26 15:33:41.834] debug: SFU 705b81f6-1ce5-4d54-b062-6980af6d68c3 stdout: WebRTCSFU: webSocketHandler: ReadMessage: error %!w(*websocket.CloseError=&{1006 unexpected EOF})
web-1  | WebRTCSFU: webSocketHandler: Closing a peer connection
web-1  | 
web-1  | [2024-04-26 15:33:41.836] debug: SFU 705b81f6-1ce5-4d54-b062-6980af6d68c3 stdout: WebRTCSFU: OnTrack: error during read: EOF
web-1  | WebRTCSFU: OnTrack: removing track %!w(func() string=0x89c340)
web-1  | WebRTCSFU: removeTrack: t.ID video_1_0
web-1  | WebRTCSFU: removeTrack: Calling signalPeerConnections
web-1  | WebRTCSFU: signalPeerConnections
web-1  | WebRTCSFU: signalPeerConnections: attempting sync
web-1  | WebRTCSFU: webSocketHandler: Closing a ThreadSafeWriter
web-1  | WebRTCSFU: webSocketHandler: OnConnectionStateChange: Peer connection state has changed to closed
web-1  | WebRTCSFU: webSocketHandler: OnConnectionStateChange: Closed
web-1  | WebRTCSFU: signalPeerConnections
web-1  | WebRTCSFU: signalPeerConnections: attempting sync
web-1  | 
web-1  | [2024-04-26 15:33:46.839] debug: SFU 705b81f6-1ce5-4d54-b062-6980af6d68c3 stdout: WebRTCSFU: signalPeerConnections
web-1  | WebRTCSFU: signalPeerConnections: attempting sync
web-1  | WebRTCSFU: signalPeerConnections
web-1  | WebRTCSFU: signalPeerConnections: attempting sync
web-1  | 

The web-1 | bit is added by docker logs. The [2024-04-26 15:33:46.839] debug: SFU 705b81f6-1ce5-4d54-b062-6980af6d68c3 stdout: bit is added by the orchestrator, handy because it gives us a timestamp and an indication of where the message comes from (the webrtc SFU, in this case).

If the SFU did a stdout.flush() after every log message that would be very good: we would then get one orchestrator header per SFU log line.

Also for many of the messages it would be good if there was some indication about which connection this message pertains to.

And possibly also printing some telemetry data at the beginning of a connection. For example, I think there's something like a connectionId, if you would print socket.getpeername() together with the connectionId whenever a new connection was opened that would already greatly help.

MatthiasDeFre commented 2 months ago

The SFU logs are now updated to also show the client responsible for the message. The ID that is shown is an internal one and not the one used in peer.exe (as we probably want to eliminate that one further down the line). At the start of each connection the IP of the connecting user is also shown together with this internal client ID, this way we can know which machine belongs to which client ID.

We can also consider using the username instead of a numerical ID in case that would further clarify the logs.

For the stdout.flush() problem. We are 99% sure that the problem is not related to this. We tried many different ways of flushing the stdout and the same problem kept occuring. This was also the case with a simple C++ application that uses std::cout (also with different methods of stdout flushing). For this reason we made some changes to the orchestrator (https://github.com/cwi-dis/vr2gather-orchestrator-v2/commit/d175bf5e484bd279a83054d50a4b1c7304558302) which splits the received output based on the newline character. That way everything is shown nicely under each other, with each line having the orchestrator error.