quarnster / boxeebox-xbmc

Aiming to get xbmc up and running on the boxee box
Other
118 stars 44 forks source link

PTS questions from KeyserSoze1 #122

Open quarnster opened 10 years ago

quarnster commented 10 years ago

I prefer to be summoned via @quarnster rather than a PM in the forums. You can ask in this issue or open up a new one if there are followups or any other questions about something else.

@KeyserSoze1 wrote:

I've been making changes to some of the audio interface and attempting to add some features that were originally in the boxee code. However I have been finding myself breaking the video in many cases.

The renderer seems to be taking the passed in pts value (which is modified by the audio delay and the display latency?). And then determine how much to add to the pts value. All of this seems to be predicated on the current clock time. I'm trying to understand how all of this is setup and intended to work so I can then understand how I'm affecting it and then make changes if I need to (on the audio or video end).

quarnster commented 10 years ago

@KeyserSoze1 wrote:

I've been making changes to some of the audio interface and attempting to add some features that were originally in the boxee code. However I have been finding myself breaking the video in many cases.

The renderer seems to be taking the passed in pts value (which is modified by the audio delay and the display latency?). And then determine how much to add to the pts value. All of this seems to be predicated on the current clock time. I'm trying to understand how all of this is setup and intended to work so I can then understand how I'm affecting it and then make changes if I need to (on the audio or video end).

The original xbmc code was clearly written without hardware acceleration in mind, so I had to hack to counteract these bad expectations made by xbmc's dvdplayer component.

I recommend reading this post and references for some additional info: https://github.com/quarnster/boxeebox-xbmc/issues/5#issuecomment-24811382.

Don't know your background, but hardware does a lot of stuff in parallel; it has a pipeline of some depth where giving it input data it will take a number of cycles until we see the result. This delay is one form of latency.

If we think of this in a software kind of way applied to real life we'd have one guy with a bucket of water running from the lake to the fire and back again. Each step is a clock cycle, the total number of steps from filling the bucket until our guy is back and ready to fill it again is the latency. With software the throughput is the latency, in other words there's no way our guy can do this work any more efficient than he already is.

Hardware in this analogy is to instead have a person and two buckets at each step. Each cycle, each person forwards one bucket full of water to the next person in line, and passes an empty bucket back to the previous person. The latency in this example is still the same; there's no way that the water will get from the lake to the fire any faster, the throughput however has been radically improved; we are able to throw one bucket of water on the fire each cycle.

You quickly realize that what matters here is throughput not latency, and this is typically also the case with hardware where each cycle each "person" doesn't do that much and instead you make up for it by having a lot of people helping out.

Getting back to xbmc, it presumes that the video latency is the throughput and that there is only a single guy doing all the work from reading the data from file to finally displaying the decoded image on screen. In the hardware case this means that while each person is still passing the two buckets to the next and the previous person in line, only a single bucket is actually filled with water at any given time in the system. These wasted "cycles" are called pipeline bubbles as the pipeline is not transporting anything other than air when it should have transported water. Wasteful, but not necessarily a problem as long as the latency is low enough to be able to keep up with this 1/latency inefficency. But name one hardware vendor who report their hardware's capabilities when it's used at 1/latency "efficiency".

The hardware is not only capable of decoding video frames, it can scale, rotate, and display them on the screen at the appropriate time as determined by a timestamp, completely independent of the 3D graphics hardware. In other words, we are in charge of feeding the hardware with data to decode, but for the rest of the pipeline it takes care of itself just fine. It knows when the appropriate time to display a frame is by having the input buffers marked with a timestamp and comparing this timestamp to a hardware clock running at IIRC 90kHz.

The original boxee code ignored xbmc's time sync code and wrote its own code to handle it instead. That introduced a lot of code just dealing with time sync for boxee box specifically and would have to be "re-ported" back to the more recent xbmc code this effort was initially based on. Alternatively it could be thrown out and simplified by making use of XBMC timestamp synchronization code. I chose, perhaps wrong, to do the latter.

Unfortunately again, as xbmc thinks the 3D hardware is in charge of displaying video frames on the screen at the right time and that the guy who's responsible for this task is the same guy who is in charge of bringing the bucket with input video data to the decoder, we then have in psuedo code this situation:

while {
    dataToDecode = data.nextPacket();
    codecStatus = codec.decode(dataToDecode);
    if (codecStatus == WantMoreData) {
        continue;
    } else if (codecStatus == newOutputAvailable) {
        picture = codec.picture();
        picture.time = adjustAccordingToUserSettings(picture.time);
        waitUntilAppropriate(picture.time);
        display(picture);
    }
}

Remember that we don't see the different stages of the data as it is passed through the hardware, or differently worded; all we do is feed the codec with new input data. But as we want to make use of xbmc's existing code to adjust the timestamp, this means that the input buffers need to have their timestamps modified before they are sent to the video codec. XBMC adjusts the timestamp of "Picture" objects, which it thinks are decoded video frames ready to be forwarded to the "VideoRenderer".

No biggie, hack 1; I introduced a new "SMD" video data format and we can pretend that this still encoded data is just a "SMD" specific rendering format.

But to send this "picture" to the renderer, we need to report that a picture is available, and as xbmc thinks it needs to wait for vsync for each "picture" reported available this will introduce pipeline bubbles and stall the pipeline. To avoid these bubbles, we want to feed the codec data as fast as it can accept, and drivers usually help a little bit by allowing us to give it more data than it can handle so that there's a queue of data waiting to be submitted.

Introduce hack 2; we intentionally report "wantmoredata" from the codec in which case xbmc does not try to display anything, but it goes back into reading the next data packet available to feed the codec as soon as possible.

This gives us pretty good use of the video hardware so that it is able to decode videos of these high resolution and bitrates it is supposed to be able to decode, and we get to make use of xbmc's existing av-sync mechanisms.

Unfortunate side effect: when video is playing, the ui (not the video, but buttons, lists, subtitles, etc) will render slowly as we only report "picture available" for a fraction of the frames so it is rendering the ui elements just a few times per second.

We also need to ensure that the HW clock is in sync with XBMC's clock, which is done in some code referenced in that post I referenced earlier.

So that would be a brain dump of that, hope it made sense :)

KeyserSoze1 commented 10 years ago

Thank you! This is exactly the information I needed. What the intentions of the current code is doing. You did an excellent job explaining it. Have you ever considered teaching as a profession :). It gives me a good understanding. Now to figure out why I'm getting a lot of frame pts is before the start time issues.

quarnster commented 10 years ago

If you post a log, perhaps I could offer some insight based on the debug output

KeyserSoze1 commented 10 years ago

Thanks for the offer. http://xbmclogs.com/show.php?id=159902

Just to let you know the specifics of what I am trying to do.. Currently the xbmc passthrough code packages up the data in IEC 61937 format. Passing this out as PCM via the driver seems to cause issues with some systems (mine included). I've changed it to passing it in as PCM to AC3 and that fixes the Dolby Digital issue I have seen (not sure but the driver seems to read the 61937 packets ok). I am speculating that some systems require the channel status flag of the IEC 60958 stream to indicate non-pcm, which the driver is probably not setting since we are indicating PCM input. What I would like to do is skip the packetization aspect if possible and pass the data raw into the driver. I've increased the buffer length from 4k->8k and introduced a new Intel Audio Codec which is similar to the passthrough codec but w/o the pack step. However that has introduced other issues that I am trying to sort out now.

Any help you can offer would be greatly appreciated.

quarnster commented 10 years ago

I've increased the buffer length from 4k->8k ... 18:43:41 T:2991586192 WARNING: CIntelSMDRenderer::FlushAndSync - 0335 Frame pts is waaay behind the start time and wasn't marked as a discontinuity. Forcing Flush. start: 3.567000, pts: 0.459000 ... 18:43:41 T:2991586192 INFO: CIntelSMDRenderer::FlushAndSync - 0363: flushing start: 3.61, base: 525.50

Increasing the buffer length means increased latency, it's possible this situation is solved by relaxing the "behindness" allowance in the video player. Try simply doubling it from 2 to 4 to match your 4k -> 8k change.

quarnster commented 10 years ago

Though reading the code now, I think the comparison should really be (firstpts+add+2*DVD_TIME_BASE)/(double)DVD_TIME_BASE as "start" will in the end be reset to the post adjusted firstpts. I'd try this first before doubling the allowance

KeyserSoze1 commented 10 years ago

Well while changing it as you mentioned in your last post does remove those errors, the end result is the same. I also went back to a 4k buffer to eliminate that as a possible issue. However the latency (or cache time) calculation at the IntelAudioSink has been modified by me to try to determine the cache time with raw data (it's possible this isn't 100% accurate yet, it's based on the frame size of the raw data and the final size that it will become once expanded into IEC 61937). Here's some interesting parts of the log...

17:32:54 T:2985753488 WARNING: CIntelSMDRenderer::FlushAndSync - 0380 - start is 3.150000, firstpts is 0.042000 but wasn't marked for discontinuity! Expected: 0.000000, actual: 204963823041216.750000, diff: -204963823041216.750000 17:32:54 T:2985753488 WARNING: CIntelSMDRenderer::FlushAndSync - 0380 - start is 3.150000, firstpts is 0.083000 but wasn't marked for discontinuity! Expected: 0.000000, actual: 204963823041216.750000, diff: -204963823041216.750000 17:32:55 T:2985753488 WARNING: CIntelSMDRenderer::FlushAndSync - 0380 - start is 3.150000, firstpts is 0.167000 but wasn't marked for discontinuity! Expected: 0.124103, actual: 204963823041217.218750, diff: -204963823041217.093750 17:32:55 T:2985753488 WARNING: CIntelSMDRenderer::FlushAndSync - 0389 Diff too large, will force a Flush instead of adjusting clock: 0.124103, 204963823041217.218750, -204963823041217.093750

And then later...

17:32:56 T:2985753488 WARNING: CIntelSMDRenderer::FlushAndSync - 0380 - start is 4.234000, firstpts is 1.168000 but wasn't marked for discontinuity! Expected: 4.801813, actual: 4.801689, diff: 0.000124 17:32:56 T:2985753488 WARNING: CIntelSMDRenderer::FlushAndSync - 0380 - start is 4.234000, firstpts is 1.251000 but wasn't marked for discontinuity! Expected: 4.844511, actual: 4.844289, diff: 0.000222 17:32:56 T:2985753488 WARNING: CIntelSMDRenderer::FlushAndSync - 0380 - start is 4.234000, firstpts is 1.543000 but wasn't marked for discontinuity! Expected: 4.885416, actual: 4.885089, diff: 0.000327 17:32:56 T:2985753488 WARNING: CIntelSMDRenderer::FlushAndSync - 0380 - start is 4.234000, firstpts is 1.418000 but wasn't marked for discontinuity! Expected: 4.926959, actual: 4.926533, diff: 0.000426

The first chunk looks to me like some kind of size_t/unsigned int that was put into "negative values" then converted to double. Then a lot of "minor" differences that seem to be big enough to cause the frame not to be displayed. Or so it seems.

quarnster commented 10 years ago

The if statement resulting in the "wasn't marked for discontinuity" print is wrong too and also needs to be adjusted to test with (firstpts+add). That one is however harmless.

17:32:55 T:2985753488 WARNING: CIntelSMDRenderer::FlushAndSync - 0389 Diff too large, will force a Flush instead of adjusting clock: 0.124103, 204963823041217.218750, -204963823041217.093750

The ismd api has time as an unsigned value so negatives will look like it's waaaaay in the future. The base time and start packet must not ever be negative!

IIRC the renderer won't process frames with a timestamp less than the start packet so if it was set with a negative start time you won't see anything on the screen. IIRC when that happened I sometimes had to restart to box to be able to get it to render video again

KeyserSoze1 commented 10 years ago

The rabbit hole deepens...

15:43:17 T:2952813456 INFO: CIntelSMDRenderer::FlushAndSync - 0363: flushing start: 3.15, base: 76.11 15:43:17 T:2952813456 INFO: IntelSMDGlobals::SetVideoRenderBaseTime - 1354 base time 76108 ms 15:43:17 T:2952813456 DEBUG: IntelSMDGlobals::SendStartPacket - 0523 15:43:17 T:2952813456 DEBUG: IntelSMDGlobals::SetVideoDecoderState - 1186 3 15:43:17 T:2952813456 DEBUG: IntelSMDGlobals::SetVideoDecoderState - 1207 state 3 ret 0 15:43:17 T:2952813456 DEBUG: IntelSMDGlobals::SetVideoRenderState - 1214 3 15:43:17 T:2952813456 DEBUG: IntelSMDGlobals::SetVideoRenderState - 1257 state 3 ret 0 15:43:17 T:2952813456 WARNING: CIntelSMDRenderer::FlushAndSync - 0381 - current is less than base + start, current: 6521357, base: 0, start: 6849761 15:43:20 T:2952813456 WARNING: CIntelSMDRenderer::FlushAndSync - 0381 - current is less than base + start, current: 6736352, base: 0, start: 6849761 15:43:20 T:2952813456 WARNING: CIntelSMDRenderer::FlushAndSync - 0408 Wanted to adjust the clock but couldn't!: 74.848356, 4.051857, 76.108456 15:43:20 T:2952813456 WARNING: CIntelSMDRenderer::FlushAndSync - 0381 - current is less than base + start, current: 6739847, base: 0, start: 6849761 15:43:20 T:2952813456 WARNING: CIntelSMDRenderer::FlushAndSync - 0408 Wanted to adjust the clock but couldn't!: 74.887189, 4.090685, 76.108456 15:43:20 T:2952813456 WARNING: CIntelSMDRenderer::FlushAndSync - 0381 - current is less than base + start, current: 6743680, base: 0, start: 6849761 15:43:20 T:2952813456 WARNING: CIntelSMDRenderer::FlushAndSync - 0408 Wanted to adjust the clock but couldn't!: 74.929778, 4.133276, 76.108456 15:43:20 T:2952813456 WARNING: CIntelSMDRenderer::FlushAndSync - 0381 - current is less than base + start, current: 6747386, base: 0, start: 6849761 15:43:20 T:2952813456 WARNING: CIntelSMDRenderer::FlushAndSync - 0408 Wanted to adjust the clock but couldn't!: 74.970956, 4.174456, 76.108456 15:43:20 T:2952813456 WARNING: CIntelSMDRenderer::FlushAndSync - 0381 - current is less than base + start, current: 6751150, base: 0, start: 6849761 15:43:20 T:2952813456 WARNING: CIntelSMDRenderer::FlushAndSync - 0408 Wanted to adjust the clock but couldn't!: 75.012778, 4.216271, 76.108456 15:43:20 T:2952813456 WARNING: CIntelSMDRenderer::FlushAndSync - 0381 - current is less than base + start, current: 6754875, base: 0, start: 6849761

I added some additional logging to indicate when the "negative" values were occurring as you can see. As you can see the base time never changes from 0 the whole time. Is that the expected result? The API is not reporting any errors returning the current time. The current non-static variable is able to retrieve the clock time fine further in the function.

KeyserSoze1 commented 10 years ago

Disregard my previous post, I shouldn't post when I am working on other things. I fixed some of the logic I introduced for checking negative values and now get a lot of these...

16:48:32 T:3002837904 WARNING: CIntelSMDRenderer::FlushAndSync - 0408 Wanted to adjust the clock but couldn't!: 61.972933, 0.887940, 62.469067 16:48:32 T:3002837904 WARNING: CIntelSMDRenderer::FlushAndSync - 0408 Wanted to adjust the clock but couldn't!: 62.012333, 0.887950, 62.469067 16:48:32 T:3002837904 WARNING: CIntelSMDRenderer::FlushAndSync - 0408 Wanted to adjust the clock but couldn't!: 62.152933, 0.887946, 62.469067 16:48:32 T:3002837904 WARNING: CIntelSMDRenderer::FlushAndSync - 0408 Wanted to adjust the clock but couldn't!: 62.181044, 0.887944, 62.469067 16:48:32 T:3002837904 WARNING: CIntelSMDRenderer::FlushAndSync - 0408 Wanted to adjust the clock but couldn't!: 62.236611, 0.887961, 62.469067 16:48:32 T:3002837904 WARNING: CIntelSMDRenderer::FlushAndSync - 0408 Wanted to adjust the clock but couldn't!: 62.263022, 0.887954, 62.469067 16:48:32 T:3002837904 WARNING: CIntelSMDRenderer::FlushAndSync - 0408 Wanted to adjust the clock but couldn't!: 62.304378, 0.887945, 62.469067 16:48:32 T:3002837904 WARNING: CIntelSMDRenderer::FlushAndSync - 0408 Wanted to adjust the clock but couldn't!: 62.346067, 0.887950, 62.469067 16:48:32 T:3002837904 WARNING: CIntelSMDRenderer::FlushAndSync - 0408 Wanted to adjust the clock but couldn't!: 62.387878, 0.887951, 62.469067 16:48:32 T:3002837904 WARNING: CIntelSMDRenderer::FlushAndSync - 0408 Wanted to adjust the clock but couldn't!: 62.430600, 0.887752, 62.469067

Which seems to indicate the base is ahead of current? Any idea how we get in that state?

quarnster commented 10 years ago

Yeah, there's a +90000/2 in there, which would be almost exactly (62.469067-61.972933)*90000=44652.

Don't recall the reason for it, I think it might be from earlier experiments in trying to get it to queue up buffers and it might not be needed now.

The warning used to show up back then as well so isn't something new you introduced. I never bothered to get rid of it as we are talking about a period of about half a second where the condition holds true and it didn't cause any other negative issues as far as I could tell.

KeyserSoze1 commented 10 years ago

Good news. Removing that extra 1/2 second seems to have helped considerably. I also commented out this condition as it also seemed to be causing me some issues. I have some audio sync issues to figure out. But at least the video and audio are both playing again and I'm feeding the raw audio data into the driver.

I want to thank you for your help. You took hours off of my work helping me understand the reason for a lot of what the IntelSMDRenderer class was doing. I still wonder if what I'm doing is ultimately the best option, but I guess that's the point of this exercise is to see what the result will be.

quarnster commented 10 years ago

Glad I could help. Note that quite a bit in this whole process was me trial'n'erroring so if something looks strange it's quite possible it doesn't make sense and is a left-over from some experiment that was never cleaned up. In other words, don't assume it's right or the best way to do things.

cadesalaberry commented 10 years ago

Impressive explanation, thanks a billion for your work guys !