Closed lcksk closed 3 years ago
Here is a perf flame diagram for 2x playback:
I don't understand what "too much" means. Compared to what? and by who's standards?
Also, why did you fail to follow the report template? the log file is required, and the "expected behavior" is needed too.
If you want to compare it to the older scaletempo, then just use --af=scaletempo
. If you want to compare it to without audio at all, just use --no-audio
.
I'm closing this for now.
Feel free to request to open it once there's actual info/data which suggests an issue.
scaletempo2 isn't even showing in that perf flamegraph. The decoder is though, because you are decoding twice as much data in the same amount of time. Maybe that's what you're seeing.
I'm sorry that I didn't collect logs because I posted them on my mobile phone for some reasons. Now I've done some further work. I hope it's useful,
./mpv --msg-level=all=trace --log-file=~/Desktop/scaletempo.log --hwdec=vaapi --vo=gpu --af=scaletempo --speed=2.0 /home/luckych/Videos/test_1080P.mov top -d 1 |grep mpv 2>&1 |tee ~/Desktop/scaletempo.log
./mpv --msg-level=all=trace --log-file=~/Desktop/scaletempo2.log --hwdec=vaapi --vo=gpu --af=scaletempo2 --speed=2.0 /home/luckych/Videos/test_1080P.mov top -d 1 |grep mpv 2>&1 |tee ~/Desktop/scaletempo2.log
cvlc --rate 2.0 --file-logging --logfile ~/Desktop/vlc_2x.log --log-verbose 3 /home/luckych/Videos/test_1080P.mov top -d 1 |grep vlc 2>&1 |tee ~/Desktop/vlc_cpu.log logs & CPU data of scaletempo, scaletempo2 and VLC are here: trace_cpu.zip
VLC is just for comparison. Maybe the decoder and VO used by VLC are different, so the CPU of VLC is very low compared with MPV. I think this may be related to the command line... Here I don't want to discuss configuration, but I just want to compare the performance differences between scaletempo and scaletempo2. Under the same conditions, the differences are obvious from the data results.
scaletempo2 isn't even showing in that perf flamegraph.
No, it's here, and if I understand it correctly, its flame graph frequency is almost the same as the soft decoding function call stack.…╯﹏╰
I see it now, decode_slice is about 3 times bigger than scaletempo2 though.
I can look into SIMDing it, but I'm only familiar with AArch64 assembly
Found an easy optimisation, didn't bother to profile it but feel free to if you have the tooling all set up.
@rolandjon can you attach a log of both configurations?
Wow, it’s amazing, with @haasn's patch,there is definitely performance improvement, about 2~3 times improvement in my case.
I don’t have an aarch64 device, maybe someone else can help verify @CounterPillow' patch and post feedback here.🙂
Currently I'm trying to get a gcc bug reporting account (they have automatic signups disabled, I needed to e-mail them) so that I can tell them they are not very good at generating vector code on aarch64. If they fix it we can just use haasn's patch and throw mine away, otherwise we can merge both together I guess and use mine on aarch64.
Glad to know this works well for you, and thank you for nerdsniping us into writing some SIMD.
@rolandjon can you attach a log of both configurations?
@haasn Sorry I might have missed this comment,you required log should be here https://github.com/mpv-player/mpv/files/6526698/trace_cpu.zip
BTW, I tested it (https://github.com/mpv-player/mpv/commit/0d75b7a192abc1c36e5842fc7c64fcdcd2655114) with an Android device today. If drop HAVE_VECTOR macro, your code will actually have an immediate effect on the Android platform I tested. (with https://github.com/mpv-android/mpv-android project ). So, I'm thinking whether there is something wrong with wscript detecting the vector instruction in waf, in fact, there should be more platforms that benefit. -)
Currently I'm trying to get a gcc bug reporting account (they have automatic signups disabled, I needed to e-mail them) so that I can tell them they are not very good at generating vector code on aarch64. If they fix it we can just use haasn's patch and throw mine away, otherwise we can merge both together I guess and use mine on aarch64.
Glad to know this works well for you, and thank you for nerdsniping us into writing some SIMD.
Loving community.You are our superheros. ^-^
Another advantage of running a few more platforms is that I almost found another function (multi_channel_moving_block_energies) with performance issures,
Flamegraph of an Android device:
Important Information
Provide following Information:
If you're not using git master or the latest release, update. Releases are listed here: https://github.com/mpv-player/mpv/releases
Reproduction steps
Recently, when I use 2x speed playback, the CPU is more than twice as high as when I use 1x speed playback. When 1x , cpu= 18.2 % ./mpv --no-config /home/test/Videos/1080-test.mp4 then if 2x,cpu = 48.2 % ./mpv --no-config --speed=2.0 /home/test/Videos/1080-test.mp4
Expected behavior
There shouldn't be such a high CPU usage. Compared with scaletempo, my impression is close to 10% less than scaletempo 2 (but I'm not sure, it's just an inaccurate recollection of the previous impression -( )
Actual behavior
Log file
The issue will be closed for ignoring the issue template.
Sample files
Sample files needed to reproduce this issue can be uploaded to https://0x0.st/ or similar sites. (Only needed if the issue cannot be reproduced without it.) Do not use garbage like "cloud storage", especially not Google Drive.