manio / vdr-plugin-dvbapi

VDR dvbapi plugin for use with OSCam
http://www.streamboard.tv/wbb2/thread.php?threadid=40060
GNU General Public License v2.0
58 stars 25 forks source link

Improve descrambler performance #105

Closed glenvt18 closed 8 years ago

glenvt18 commented 8 years ago

Hi @manio.

I've come across the issue, described here (about TBS 5922). In short, some tuners ("good" ones) return from poll() when there are 8-10 packets available to read, some other ("bad" ones) - when there are only 1-3 packets. If the descrambler processes all the data available in the ring buffer ("good" tuner, fast CPU), then the ring buffer sleeps for 100 ms, and a new (big) chunk of data is available on the next call . But, if a (small) number of packets arrives during descrambling ("bad" tuner, weak CPU), the ring buffer will never (or rarely) sleep always feeding the descrambler with small chunks of data. This keeps the batch buffer of a parallel descrambler heavily underfilled (10-20%) causing huge CPU load (4x-20x raise). "Bad" tuners are not so uncommon - in fact, 2 out of 3 tuners I've tested turned out to be "bad" with ARM and even Atom platforms being affected by this issue. I wouldn't say that those tuners are bad or their drivers are broken. That's just the way they implement things.

The solution you proposed in #43 does not always work well. The delay value depends on a number of factors such as tuner, hardware, descrambler implementation and the batch size. It can only be found experimentally for a particular combination of them. And even then the fill ratio is not close to 100%. Though, this delay helps to reduce TS buffer thread CPU load. And it only works for a dvb device (not satip or iptv).

To address this issue I've come up with a simple algorithm which uses a low-water mark to keep the fill ratio high. Processing (descrambling or filtering) is only allowed when there are at least low-water mark bytes in the input buffer. Otherwise, the thread sleeps for Timeout ms waiting for a bigger chunk. The watermark is then updated with the number of bytes received during the sleep. The water mark value is limited considering the size of the device's ring buffer. In other words, the algorithm tries to to keep as much data as possible, but not more than the limit, and doesn't introduce zapping lag more than Timeout ms. Measuring the water mark in terms of time, not bytes, helps to handle several "major" streams (pids) which bit rates differ a lot. The algorithm assumes that the device uses cRingBufferLinear (which is the case for dvb, satip and iptv devices). It should work with a simple (not ring) buffer too, but without any performance improvements. An average fill ratio (=efficiency), I measured, is 99.6% for streams with bit rates >= 2 Mbit/s, with a "bad" tuner. With a "good" tuner this algorithm increases performance by 5-20%.

This hack simulates "bad" tuner behaviour with a "good" tuner. Tuned for 2-4 Mbit/s.

This patch shows what happens inside and measures the fill ratio (grep for 'Decrypt block').

Please review.

manio commented 8 years ago

Thank you very much for this! :) Merging...