Major slow-down with complex scripts in MKV when not sorted by time

GoogleCodeExporter commented 8 years ago

__________
WinXP SP3 x86
AMD X2 4800+ @2.64Ghz
MPC-HC 3752
madVR 0.73
CoreAVC 3.0.1
Haali Media Splitter 1.11.288.0
xy_vsfilter_test_20111019
__________

Steps to reproduce:

Underclock CPU so performance differences are more visible.

Mimic playback setup with MPC-HC, madVR, and Haali Splitter.

Mux each of the attached scripts into an MKV. Optimally, mux DiVB scripts into 
1920x1080 (16:9 video) & CCS_OP2 into 1440x1080 (4:3 video).

Playback each MKV and notice that the ones with the unsorted scripts 
significantly lag.
__________

As far as I can tell, this issue only happens when scripts are muxed into an 
MKV. It does not happen if the script is loaded externally.

When this bug is triggered it negates the majority of the speed enhancements 
you made in xy-VSFilter, making it almost as slow as VSFilter 2.39 with 
unsorted scripts, while it is significantly faster when the same script is 
sorted.

It seems like either A) Fix bug in your caching code, B) Enhance your parser to 
automatically sort scripts as it receives subtitle packets from the splitter, 
C) Modify xy-VSFilter to request subtitle packets further into the future from 
the splitter, or D) Bypass splitter and implement an on-the-fly MKV subtitle 
parser/demuxer to enable you to load the entire script and sort it.

On my system, its the difference is very significant. With the unsorted scripts 
I get massive dropped frames and the video is completely unwatchable. With the 
sorted script it's still a bit slow, but it's very playable with only a couple 
dropped frames.

On another note, you may want to consider increasing the maximum cache limit 
for the L1 cache, as it gets completely raped by the CCS OP2 script. Both of 
these scripts are of the extremely slow variety, so they may be good for 
performance tuning xy-VSFilter.

Original issue reported on code.google.com by cyber.sp...@gmail.com on 19 Oct 2011 at 9:31

Attachments:

DiVB_(unsorted).ass
DiVB_(sorted).ass
CCSOP2(unsorted).ass
[CCS_OP2 (sorted).ass](https://storage.googleapis.com/google-code-attachments/xy-vsfilter/issue-28/comment-0/CCS_OP2 %28sorted%29.ass)

GoogleCodeExporter commented 8 years ago

Haven't found the reason that causes the different. But after some performance 
tuning, even the 10-bit 1080p embeded with the unsorted versions above can be 
played smooth on my system. Here's a *preview* version. I'll try to make a new 
version this weekend (not sure if I can).

Original comment by YuZhuoHu...@gmail.com on 28 Oct 2011 at 3:20

Attachments:

xy_vsfilter_test_20111028.7z

GoogleCodeExporter commented 8 years ago

Original comment by YuZhuoHu...@gmail.com on 28 Oct 2011 at 3:36

Changed state: Started

GoogleCodeExporter commented 8 years ago

Unfortunately preview xy_vsfilter_test_20111028 version doesn't seem to help at 
all with the unsorted script problem.

The MKVs with the sorted sorted scripts play smoothly in madVR, with minimal 
slow-down.

The MKVs with the unsorted scripts have 100+ dropped frames in madVR, and are 
unplayable.

Another thing I forgot to mention, is this problem is much more noticeable 
using madVR. Using VMR9 it still happens, but for whatever reason VMR9 
sometimes renders at a lower frame-rate in slow-motion rather than dropping 
frames.

It's possible this is an AMD Athlon 64 architecture specific problem, if you 
can't reproduce it on your Intel (I'll test on my secondary Intel i5 system 
later). There was another AMD-only slowdown issue like this with VSFilter 2.39 
in the past with multiple lines, blur, and be. Nobody every figured out what 
the problem code was, but somehow enabling all compiler optimizations fixed it. 
This seem completely unrelated to that past issue, but it's very possible there 
is still some lingering code in VSFilter which runs great on Intel and horribly 
on AMD.

Maybe I should upload the original MKV samples. What file hosting sites are 
good for you in China?

Original comment by cyber.sp...@gmail.com on 28 Oct 2011 at 7:03

GoogleCodeExporter commented 8 years ago

The samples attached on Issue 37, crash using xy_vsfilter_test_20111028.

For example 
http://xy-vsfilter.googlecode.com/issues/attachment?aid=370007001&name=MH-01-OP.
mkv&token=288e09a1b5f35204a17c0fbc5c9f52f4 crashes at the after 22 seconds.

Original comment by cyber.sp...@gmail.com on 28 Oct 2011 at 7:17

GoogleCodeExporter commented 8 years ago

Hotmail's SkyDrive is the only one I know that probably both you and I can use.

Original comment by YuZhuoHu...@gmail.com on 29 Oct 2011 at 12:03

GoogleCodeExporter commented 8 years ago

Try that "8x8 fast" sub-pixel positioning option see if it helps with 
performance.

Original comment by YuZhuoHu...@gmail.com on 29 Oct 2011 at 12:30

GoogleCodeExporter commented 8 years ago

I emailed you a SkyDrive link.

Below are some bechmarks with AVSMeter using DirectShowSource. While not 
representative of real playback fps because of the overhead, it does give you a 
nice picture of how severe the slowdown is.

CCS OP2 Sorted 8x8fast 200 frames | min fps 7.11 | avg fps 13.53 |
CCS OP2 Unsorted 8x8fast 200 frames | min fps 0.69 (~10x slower) | avg fps 1.35 
(~10x slower) |

CCS OP2 Sorted 8x8 200 frames | min fps 7.15 | avg fps 13.54 |
CCS OP2 Unsorted 8x8 200 frames | min fps 0.69 (~10x slower) | avg fps 1.45 
(~10x slower) |

DiVB Sorted 8x8fast 1000 frames | min fps 7.07 | avg fps 16.24 |
DiVB Unsorted 8x8fast 1000 frames | min fps 1.55 (~4.5x slower) | avg fps 9.20 
(~1.75x slower) |

DiVB Sorted 8x8 1000 frames | min fps 7.11 | avg fps 16.30 |
DiVB Unsorted 8x8 1000 frames | min fps 1.54 (~4.5x slower) | avg fps 9.24 
(~1.75x slower) |

The "8x8 fast" option seems to make no difference in performance compared to 
"8x8". If anything, the normal "8x8" may be ever so slightly faster than "8x8 
fast".

Original comment by cyber.sp...@gmail.com on 29 Oct 2011 at 2:34

GoogleCodeExporter commented 8 years ago

Try xy_vsfilter_test_20111030.7z. 
That crash issue of that preview version is fixed too.
And for this issue, the slow down from sorted versions to unsorted versions is 
caused by a bug in the script parser. Unsorted versions trigger the bug, and 
the consequence is just like duplicating many lines in the sorted versions. Mix 
the scripts with any video，not necessarily 1080p, and observe the OSD 
information while playing, a huge difference of Cache LV1 query_count, which 
corresponding to the number of alphablending operations done, between the 
sorted and unsorted version can be seen. To get OSD information, goto 
properties->misc, check "Show OSD statistics". 
The difference from my smooth feeling to your unplayable result with the 
preview version may be relative to cpu architecture.

Original comment by YuZhuoHu...@gmail.com on 30 Oct 2011 at 1:44

GoogleCodeExporter commented 8 years ago

There is still a measurable slowdown with the CCS OP2 unsorted sample using 
xy_vsfilter_test_20111030. The good news is your speed-up in that build appears 
to have completely compensated for the smaller DiVB unsorted slowdown 
(benchmark results were near-identical).

CCS OP2 Sorted 8x8_normal 1940 frames |min fps 44.21 | avg fps 52.33 |
CCS OP2 Sorted 8x8_fast 1940 frames |min fps 42.56 | avg fps 51.28 |

CCS OP2 Unsorted 8x8_normal 1940 frames |min fps 30.53 (~1.45x slower) | avg 
fps 37.39 (~1.4x slower) |
CCS OP2 Unsorted 8x8_fast 1940 frames |min fps 29.45 (~1.5x slower) | avg fps 
36.86 (~1.42x slower) |

Both samples are very playable now, but the remaining slowdown is a bit of a 
mystery. At least with the CCS OP2 sample, the normal 8x8 subpixel positioning 
continues to be slightly faster than 8x8fast...

Original comment by cyber.sp...@gmail.com on 30 Oct 2011 at 1:49

GoogleCodeExporter commented 8 years ago

I forgot to say that 8x8_fast option use an additional cache whose info is not 
yet showed in OSD. The cache lies even between Cache LV1 and the afterward 
alphablending operation. If using 8x8_fast option, Cache LV1's query_count no 
longer equal to alphablending operation number.
And the bilinear interpolation 8x8_fast using is not yet SSE2 optimized too.

Original comment by YuZhuoHu...@gmail.com on 30 Oct 2011 at 2:20

GoogleCodeExporter commented 8 years ago

Original comment by cyber.sp...@gmail.com on 16 Dec 2011 at 7:47

Changed state: Fixed

sidaddi / xy-vsfilter

Major slow-down with complex scripts in MKV when not sorted by time #28