opentok / opentok-ios-sdk-samples-swift

Sample applications using the OpenTok iOS SDK in Swift
https://tokbox.com/
MIT License
137 stars 65 forks source link

OPENTOK-48681: Add range to estimated playout delay #178

Closed goncalocostamendes closed 1 year ago

goncalocostamendes commented 1 year ago

What is this PR doing?

Add range to audio playout delay, in case of incorrect inputs.

Ticket: https://jira.vonage.com/browse/OPENTOK-48681

IGitGotIt commented 1 year ago

The original JIRA issue was a type cast crash , so why are we adding range check? What happened (before) if we go over the range ?

goncalocostamendes commented 1 year ago

The original JIRA issue was a type cast crash , so why are we adding range check? What happened (before) if we go over the range ?

@IGitGotIt When programming in swift, if the value goes over the variable size the program crashes. So there is an implicit range from 0 to 65535.

We do not have any traces of the exact scenarion when the error occurs, so it looks that maybe we get some [mySession outputLatency] nonsense value in some situations, which goes over the max value. Maybe when switching mute state.

By definition, playout delay is the total amount of time experienced by an audio packet from the time instant it is generated at the source and the time instant it is played out at the destination. So 65 seconds is a extremely high value. That being said, we can shorten this value. My idea was 1s, but we can increase it to something else.

FYI, I have tested several scenarios, including default audio driver embedded in multi-party session and the value audioDevice.playoutDelay is always assigned to 15. Of course, it should vary a bit

IGitGotIt commented 1 year ago

The original JIRA issue was a type cast crash , so why are we adding range check? What happened (before) if we go over the range ?

@IGitGotIt When programming in swift, if the value goes over the variable size the program crashes. So there is an implicit range from 0 to 65535.

We do not have any traces of the exact scenarion when the error occurs, so it looks that maybe we get some [mySession outputLatency] nonsense value in some situations, which goes over the max value. Maybe when switching mute state.

By definition, playout delay is the total amount of time experienced by an audio packet from the time instant it is generated at the source and the time instant it is played out at the destination. So 65 seconds is a extremely high value. That being said, we can shorten this value. My idea was 1s, but we can increase it to something else.

FYI, I have tested several scenarios, including default audio driver embedded in multi-party session and the value audioDevice.playoutDelay is always assigned to 15. Of course, it should vary a bi

The original JIRA issue was a type cast crash , so why are we adding range check? What happened (before) if we go over the range ?

@IGitGotIt When programming in swift, if the value goes over the variable size the program crashes. So there is an implicit range from 0 to 65535.

We do not have any traces of the exact scenarion when the error occurs, so it looks that maybe we get some [mySession outputLatency] nonsense value in some situations, which goes over the max value. Maybe when switching mute state.

By definition, playout delay is the total amount of time experienced by an audio packet from the time instant it is generated at the source and the time instant it is played out at the destination. So 65 seconds is a extremely high value. That being said, we can shorten this value. My idea was 1s, but we can increase it to something else.

FYI, I have tested several scenarios, including default audio driver embedded in multi-party session and the value audioDevice.playoutDelay is always assigned to 15. Of course, it should vary a bit

First what happens if the playout delay is 65 or 10 seconds? Does webrtc choke? or it self -corrects? or does nothing? What is the audio output in such case - lagging , jerky or ok?

Also let assume webrtc self corrects when values are out of line. Do we know the range of those values? What is the guarantee that your arbitrary range will kick in that self correction or not? It may never self correct and problem will worsen instead of getting fixed. In fact this is the concern the partner had in the original ticket, which we never addressed specifically. This are just my assumptions w/o looking at webrtc code, so please test with some sample values and/or webrtc code check.

Now if after all this investigation you feel we need to do your change, you could just keep the old code and add the following (pseudo-almost correct)code: Basically control the logic where you set playoutDelay in one place for every assignment.

    var _playoutDelay: UInt32 = 15 //some default
    var playoutDelay: UInt32 {
            get {
                return _playoutDelay
            }
            set(newValue) {
                if newValue > 1000 {
                    playoutDelayMeasurementCounter = 100
                }
              _playoutDelay = min(newValue,1000)
            }
    }

In fact you can add the whole updatePlayoutDelay in the methods above, if you want...

goncalocostamendes commented 1 year ago

The original JIRA issue was a type cast crash , so why are we adding range check? What happened (before) if we go over the range ?

@IGitGotIt When programming in swift, if the value goes over the variable size the program crashes. So there is an implicit range from 0 to 65535. We do not have any traces of the exact scenarion when the error occurs, so it looks that maybe we get some [mySession outputLatency] nonsense value in some situations, which goes over the max value. Maybe when switching mute state. By definition, playout delay is the total amount of time experienced by an audio packet from the time instant it is generated at the source and the time instant it is played out at the destination. So 65 seconds is a extremely high value. That being said, we can shorten this value. My idea was 1s, but we can increase it to something else. FYI, I have tested several scenarios, including default audio driver embedded in multi-party session and the value audioDevice.playoutDelay is always assigned to 15. Of course, it should vary a bi

The original JIRA issue was a type cast crash , so why are we adding range check? What happened (before) if we go over the range ?

@IGitGotIt When programming in swift, if the value goes over the variable size the program crashes. So there is an implicit range from 0 to 65535. We do not have any traces of the exact scenarion when the error occurs, so it looks that maybe we get some [mySession outputLatency] nonsense value in some situations, which goes over the max value. Maybe when switching mute state. By definition, playout delay is the total amount of time experienced by an audio packet from the time instant it is generated at the source and the time instant it is played out at the destination. So 65 seconds is a extremely high value. That being said, we can shorten this value. My idea was 1s, but we can increase it to something else. FYI, I have tested several scenarios, including default audio driver embedded in multi-party session and the value audioDevice.playoutDelay is always assigned to 15. Of course, it should vary a bit

First what happens if the playout delay is 65 or 10 seconds? Does webrtc choke? or it self -corrects? or does nothing? What is the audio output in such case - lagging , jerky or ok?

Also let assume webrtc self corrects when values are out of line. Do we know the range of those values? What is the guarantee that your arbitrary range will kick in that self correction or not? It may never self correct and problem will worsen instead of getting fixed. In fact this is the concern the partner had in the original ticket, which we never addressed specifically. This are just my assumptions w/o looking at webrtc code, so please test with some sample values and/or webrtc code check.

Now if after all this investigation you feel we need to do your change, you could just keep the old code and add the following (pseudo-almost correct)code: Basically control the logic where you set playoutDelay in one place for every assignment.

    var _playoutDelay: UInt32 = 15 //some default
    var playoutDelay: UInt32 {
            get {
                return _playoutDelay
            }
            set(newValue) {
                if newValue > 1000 {
                    playoutDelayMeasurementCounter = 100
                }
              _playoutDelay = min(newValue,1000)
            }
    }

In fact you can add the whole updatePlayoutDelay in the methods above, if you want...

In webrtc I do not see any validation of the playout delay values. But I found that one other video conferencing company logs a warning when the values surpasses 300ms.

image

Since the value should never reach such I values, and the issue was not reproducible, I never ended up testing it forcing the delay to be so high. It is an interesting test. I will get back with the results.

Regarding the specific question the client asked. Will the following code raise unexpected results?

public func estimatedRenderDelay() -> UInt16 { let value = min(self.playoutDelay, UInt32(UInt16.max)) return UInt16(value) }

Not necessarily because the value of the delay is calculated periodically every couple of ms, so it is assumed that eventually the "garbage" value will be corrected and a proper delay value will be used. Even though the initial questions is regarding a casting, it seems to be a red herring. The delay itself should be small since it is related to the estimated playout delay of an audio. It should never be bigger than a few milliseconds.

This is a unexpected scenario. Several tests were done and the issue was never caught. In order to not change drasticaly the current implemenation we have, the max value of the range is, in my opinion, already a extremely high value.

Regarding the psedo code, yes, it seems a valid option.

goncalocostamendes commented 1 year ago

@IGitGotIt I added the results of the tests to the jira ticekt https://jira.vonage.com/browse/OPENTOK-48681. Briefly, with a 65000ms playout delay, the loss of audio packets increases considerable.

IGitGotIt commented 1 year ago

@IGitGotIt I added the results of the tests to the jira ticekt https://jira.vonage.com/browse/OPENTOK-48681. Briefly, with a 65000ms playout delay, the loss of audio packets increases considerable.

Thanks @goncalocostamendes .
Here is a good article on what playout delay does - https://flylib.com/books/en/4.245.1.52/1/

The delay typically decides how long the frame buffer holds onto the data and thus the longer it is , the more incoming (new) frames will be dropped. Now here I think a 65 seconds or even 1 seconds will behave "almost" the same since what we are providing is an "estimatedCaptureDelay" and RTP will hopefully take over if the value we provides goes out of line. (a. guess). Maybe you can try out this experiment. If my assumption is correct.. then...

So if the delay value recovers by itself having a value of 65 or 1 will not matter. You can say let's do 150 ms or 300 ms. That will make sense. But if it recovers itself quickly in the next second (the value of 100) why over-engineer it.

I would be ok leaving the code as it is and communicate the findings to the user. Even if we change the delay value, it is sort of a recommendation to RTP layer in extreme case. Also we providing a value which is different from what the Audio Unit is telling us now, does not mean Audio Unit will adjust. It will do what it wants to do. It is just a value for the RTP layer.

Let say we make a mistake programmatically/something messed up, in giving a max value for estimatedCaptureDelay then we can just add a return min(playoutDelay,300) (ms) like statement to https://github.com/opentok/opentok-ios-sdk-samples-swift/blob/main/Custom-Audio-Driver/Custom-Audio-Driver/DefaultAudioDevice.swift#L295.

Anyway try the experiment with 1 second vs 65 seconds vs 300 ms vs 100 ms and the "always/correct" present value of 15 ms from your observations.

As a side note a value of 0 is also good..let RTP decide wit its internal default.. (maybe try it out)

Thanks

goncalocostamendes commented 1 year ago

@IGitGotIt this online book is great!! I was looking for something like that.

I have on the PR, on top of the range, forcing that when a "wrong" value is set to the playout delay, instead of waiting for the next calculation (each one 1s), it will do it right on the next iteration (with the assumption that a correct value will be received in one of the next iterations). This being an assumption, maybe having only a validation between the value calculated and the max will be enough, and reduces the calls to several functions.

I found on the book:

The trade-off in playout buffer operation is between fidelity and delay: An application must decide the maximum playout delay it can accept, and this in turn determines the fraction of packets that arrive in time to be played out. A system designed for interactive usefor example, video conferencing or telephonymust try to keep the playout delay as small as possible because it cannot afford the latency incurred by the buffering. Studies of human perception point to a limit in round-trip time of about 300 milliseconds as the maximum tolerable for interactive use; this limit implies an end-to-end delay of only 150 milliseconds including network transit time and buffering delay if the system is symmetric. However, a noninteractive system, such as streaming video, television, or radio, may allow the playout buffer to grow up to several seconds, thereby enabling noninteractive systems to handle variation in packet arrival times better.

Since the 150ms was reached by studies on human perception, I believe we should stick with it. What do you think?

jvalli commented 1 year ago

Do we know why the CI is failing?

goncalocostamendes commented 1 year ago

@jvalli yes, Custom-Video-Driver, not Custom-Audio-Driver has a couple of compile errors. I created a ticket to solve that on tech-debt day https://jira.vonage.com/browse/OPENTOK-49279