Closed stronk-dev closed 1 year ago
Thanks @eliteprox - would you be able to work with @cyberj0g please @stronk-dev to help him reproduce the issue?
I haven't encountered #2605 yet so I can't speak as to reproducing that. but it does seem like that issue is caused by the bug as described in #2706.
Reproducing #2706 is as simple as having more than 2 remote transcoders and disconnecting the T with and index in rtm.remoteTranscoders
higher than 1
. All T's with an index lower than i-1
will get dropped when the disconnected T gets selected so I can imagine this can cause the set to get emptied under specific conditions
Might just be easiest to merge #2707 which fixes #2706 first to see if that also fixes #2605 as a side effect
Thanks @eliteprox - would you be able to work with @cyberj0g please @stronk-dev to help him reproduce the issue?
I've opened #2747 for this issue. I did some troubleshooting with @0xb79 and @Titan-Node this evening and reproduced it in 0.5.37 under scenarios involving multiple streams. This function cleaned up quite a bit.
Describe the bug
Whenever a T reconnects, they get a new entry in the
remoteTranscoders
list of transcoder objects held by theRemoteTranscoderManager
Their 'expired' entry in theremoteTranscoders
list only gets removed when they later get selected for a session: https://github.com/livepeer/go-livepeer/blob/e2c46a1e30ca3c5aafcc1a4f11f55523235da871/core/orchestrator.go#L968-L976Up to this point everything looks alright (although I wonder why does the Transcoder get removed from
liveTranscoders
immediately, but only gets removed fromremoteTranscoders
on selection)Anyway, the error happens in the
removeFromRemoteTranscoders
function, where it splices the expired entry out of the list https://github.com/livepeer/go-livepeer/blob/e2c46a1e30ca3c5aafcc1a4f11f55523235da871/core/orchestrator.go#L914-L938It creates the new list as
So if
i
is higher than 1, any transcoder in this list with an index lower thani-1
is going to get dropped from the list of remote transcoders!In all regards the T is still considered a live transcoder connected to the Orchestrator, so no error pops up on the O or T side. They just won't receive any more streams
To Reproduce
Have multiple remote T's connected in a split O/T setup. Have one (or more) T's reconnect and wait for a few streams to come in. At some point you will see one T not receiving any more streams