lisamelton / other_video_transcoding

Other tools to transcode videos.
MIT License
543 stars 24 forks source link

Apple M1 Pro capping at 60fps #123

Closed thudfactor closed 2 years ago

thudfactor commented 2 years ago

I'm driving a 14" 2021 M1 Pro and other-transcode seems to be converting, 1080 content at about 2.6x or 60fps. The 2017 intel Mac I was using earlier would run at around 5x or around 115fps. I see in another thread other people are getting upwards of 200 fps. Am I missing something relatively basic?

Generally all I pass at the command line is:

other-transcode --debug --add-audio all --add-subtitle all /path/to/video/file

Happy to share more information but I don't know what's relevant.

khaosx commented 2 years ago

@thudfactor,

We've just started testing out the tools and their underlying dependencies on the M1 Pro and Max, as we've only got 1 test box so far. Your experience matches what we are seeing in our tests. At the moment, the prevailing thought is that FFmpeg or something else in the tool chain isn't updated to take advantage of the new architecture. I'll let @donmelton comment further, but I encourage you to stay tuned and if you're so inclined, help us test when we have more of an idea what the fix is.

thudfactor commented 2 years ago

Happy to help however I can, @khaosx, especially with testing. Somewhat gratified to know it's not an RTFM issue, because I've been pouring over those.

mattjcraig commented 2 years ago

Confirming on 14" M1Max 32GB Unified Mem (UM?), 24 GPU Cores and I was getting ~77fps whereas same config my 13" MBP M1 was getting ~240.

Simple using "other-transcode --mp4 file"

khaosx commented 2 years ago

@thudfactor I feel your pain. The internal testing thread last Tuesday was both insanely long and intensely disappointing for all involved :)

@mattjcraig Thanks for confirming!

Just for tracking, I was able to get as much as 65fps on an x.264 transcode, but that's as good as it got. HEVC was a dumpster fire at ~6fps.

lisamelton commented 2 years ago

@thudfactor @mattjcraig Thanks for confirming what @khaosx was seeing last week! And he's not kidding, that was one loooong internal testing thread.

I'm labeling this as a "bug" even though it can't be an actual problem in other-transcode. It's probably not even a bug in FFmpeg. I've read the VideoToolbox code in the FFmpeg source and, as near as I can tell, they're not doing anything stupid like rate limiting the encoding. What's more likely is that this is a Apple VideoToolbox problem. It's either a bug or some new API which FFmpeg is not calling.

And if that's the case, it's likely we'll be waiting a significant amount of time for a fix.

thudfactor commented 2 years ago

Well, if there's one thing I've learned about transcoding video it's how to wait a significant amount of time. I'll keep my eye on this thread and be sure to ping me if you need any assistance with tests or debug logs.

mattjcraig commented 2 years ago

Hey @donmelton - is there any useful bug I can file OpenRadar/Radar?

lisamelton commented 2 years ago

@mattjcraig I suppose you could file a bug but Apple hasn't been stellar on paying attention to those. There may already be a bug in the system anyway since this issue is now widely known. But I have no idea how to confirm that.

NetRanger1967 commented 2 years ago

I have the same problem on my new MacBook Pro (14-inch, 2021) with M1 Max 10C CPU/32C GPU/64GB/4TB. Using Don Melton's "other_transcode" script with Apple's VideoToolBox and the latest version of ffmpeg, the transcoding of 4K is done at 15-20 fps and the transcoding of 1080p at 50-60 fps. H265 is much slower. I confirm that I got better performance with a MacBook Air (M1, 2020). 4K at 30-35 fps and 1080p at 190-200 fps. I hope that Apple will provide a fix soon because it is disappointing!

TraiGuzie commented 2 years ago

I also find this limitation very frustrating as I know my M1 Max MacBook is capable of way more then 2.6x at 1080p.

I am a videographer and use this program to back up a large number of files at once using a batch script, and every file capped at right around 2.6x. As my use case is different than most, I was able to test in a different way.

I tested converting 27 1080p files and that took 4 minutes using about 17% CPU power. The sweet spot I found was running the same script in 4 windows at once, bringing those same 27 files down to 39 seconds. As well as expected, we knew the machines were capable of much more.

Not very useful to the target use case but the reason I am posting in here is because the super odd thing is that some of the other windows were hitting speeds of 7x, 5x, 3x. Makes absolutely no sense but I thought it curious enough to add to the discussion.

samhutchins commented 2 years ago

HandBrake fixed it recently: https://github.com/HandBrake/HandBrake/commit/ce52b4d755a2799a4801f256622b8e8191e71220

Current snapshots go a lot faster on M1 Pro/Max, from what I gather. Hopefully ffmpeg will catch up

thudfactor commented 2 years ago

Downloaded the most recent snapshot of Handbrake, and yes -- I'm getting ~190fps instead of 60fps.

mattjcraig commented 2 years ago

Looks like same fix is already in ffmpeg too!

https://git.ffmpeg.org/gitweb/ffmpeg.git/commitdiff/4778ab2b1fa993457bb3657de56a12dc9a55f3a0?hp=802c0515067fa3f5a67feb56241789dfcfb1ad09

mattjcraig commented 2 years ago

Might be building me a ToT ffmpeg tonight... or at least over the weekend.

TraiGuzie commented 2 years ago

Interesting, all my testing was done with what I thought to be the most current version of ffmpeg, am I missing something as a terminal noob?

ffmpeg version 4.4.1 Copyright (c) 2000-2021 the FFmpeg developers built with Apple clang version 13.0.0 (clang-1300.0.29.3) configuration: --prefix=/opt/homebrew/Cellar/ffmpeg/4.4.1_5 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-avresample --enable-videotoolbox libavutil 56. 70.100 / 56. 70.100 libavcodec 58.134.100 / 58.134.100 libavformat 58. 76.100 / 58. 76.100 libavdevice 58. 13.100 / 58. 13.100 libavfilter 7.110.100 / 7.110.100 libavresample 4. 0. 0 / 4. 0. 0 libswscale 5. 9.100 / 5. 9.100 libswresample 3. 9.100 / 3. 9.100 libpostproc 55. 9.100 / 55. 9.100

lisamelton commented 2 years ago

@TraiGuzie You're using the current released version of ffmpeg. Normally that's a good thing. However, it doesn't have the fix that @mattjcraig mentioned. It's likely a nightly build might have that fix soon but we have no idea when that will make it into an official release. Sometimes that can take months.

TraiGuzie commented 2 years ago

Ah got it. I installed ffmpeg-105153-g8abc192236.7z and saw it did get above the 2.6x but only to 7.8x which should be well below what this is capable of and any combination of multi-threading could not get under 1 minute with those 27 files, so I reverted to 4.4.1 and was able to hit 35 seconds. Very odd 🤷🏼‍♂️

lisamelton commented 2 years ago

@TraiGuzie That is odd. 🤔

mattjcraig commented 2 years ago

@TraiGuzie The snapshot you grabbed is an intel only binary:

[Matts-M1MaxBookPro:~/Downloads] matt% lipo -info ffmpeg
Non-fat file: ffmpeg is architecture: x86_64

Based on my preliminary look - there are no ARM snapshots available. I took a quick try tonight building from source but after a long day at work gave up - lots of dependancies and I'm just not familiar with building ffmpeg. So it remains a weekend goal - but if anyone knows how to build ffmpeg optimally for arm from source - please do :)

skj-dev commented 2 years ago

@mattjcraig

I build a software encoder container image for other-transcode that will build on ARM64. I've only used it on AWS Graviton ARM64 systems, and though the Apple Silicon is a peculiar flavor of ARM, I would expect the build steps to also work there.

As you've noticed, there are a lot of dependencies and options to muck around with for building ffmpeg. Going through the Dockerfile might be useful from the standpoint of seeing which pieces I'm building, and the options I'm using, to get a functional ffmpeg with software encoders.

lisamelton commented 2 years ago

@ttyS0 Thanks! Once again your Docker builds come to the rescue.

mattjcraig commented 2 years ago

I find it interesting that with my previous M1 13" MBP I got ~8x speedup, so basically linear with cores. GPUs of course scale that way but being such a match at 1 core == ~30fps. Which is why I was very excited to see something like 24x with my M1Max. Hopefully I can get ffmpeg built this weekend though clearly it's gonna be a big hack job. I worry that this isn't really the right fix as it seemingly didn't affect normal plain M1s. Though the difference could certainly be in VideoToolbox having different paths for M1Max/Pro.

Mostly I'm hoping before end of work day someone else has managed to build and measure :) Building complex projects is not my domain but if nothing else surfaces I'll give it a go this weekend - thanks @ttyS0 for the pointer.

wintervaler commented 2 years ago

While building ffmpeg from source is no doubt an experience everyone should enjoy (endure?) at some point, I think you may not actually have to do this here — Homebrew, thankfully, should do the work for you.

If you use brew install ffmpeg --HEAD instead of the normal brew install command, brew will build directly from the ffmpeg Git master rather than the release branch. Assuming the fix that improves performance on these machines has been checked into Git, you'll end up with a version that should do what you want it to do.

And brew will build an ARM-compatible binary no problem — that's what I run natively on my M1 Air.

Good luck!

lisamelton commented 2 years ago

@wintervaler Great idea and thanks!

cliss commented 2 years ago

Just used @wintervaler's tip to install ffmpeg from HEAD; it does seem to me like the it includes the fix. I think.

Anyway, I used yt-dlp to download this, which is something I've been meaning to watch. The resulting file was a webm file, which I ran through other-transcode.

H265 gave me about 3.2× H264 gave me about 7.2×

This feels like a victory, but I didn't think to do tests on the release build of ffmpeg first. 🤦‍♂️

Edit: I should add this is on a M1 Max on a 14" MBP.

mattjcraig commented 2 years ago

My world has forever changed thanks to @wintervaler. I did not know you could do that and that is AMAZING! Thank you - can't wait to get home and see what's what...

TraiGuzie commented 2 years ago

Very useful info using from @wintervaler I can confirm an ARM build of the HEAD

MaxBook-Pro:test MacbookPrime$ lipo /opt/homebrew/Cellar/ffmpeg/HEAD-24b9302_5/bin/ffmpeg -info Non-fat file: /opt/homebrew/Cellar/ffmpeg/HEAD-24b9302_5/bin/ffmpeg is architecture: arm64

I still think Casey's 7.2x seems low for this hardware. Now this is where it gets interesting, testing on my maxed out 16" M1 Max:

Max single window other-transcode with 5D Mark IV 1080p MOV files 4.4.1 (arm64) : 2.6x 105153-g8abc192236.7z (intel) : 7.8x HEAD-24b9302_5 (arm64): 9.5x

But when processing a batch of 27 1080p files totaling 2 GB using optimal amount of concurrent other-transcode windows 4.4.1 (arm64) : 4 instances 35 seconds 105153-g8abc192236.7z (intel) : 3 instances 1 minute HEAD-24b9302_5 (arm64): 3 instances 50 seconds

While the HEAD version clearly has the fix and the faster single instance speeds, oddly enough for my purposes the non-fixed 4.4.1 version wins out 🤷🏼‍♂️ 4.4.1 version is the only one I can get to 99% the CPU, the others stick around 65%.

mattjcraig commented 2 years ago

I'm currently extracting a full blu so I can do some benchmarks (Ragnarok) and have yet to try BUT - I don't think this is the full fix. Seems like, based on what I'm seeing, this only gets us back to M1 levels (8 cores). Which would make me believe either there are new APIs to allow > 8 cores or there's some faulty logic somewhere so only 8 are allowed.

Like I said - I have no data yet but from what others have said I suspect I'll only see perf ~ to my old M1 13" MBP. But stay tuned - film at 11 :)

mattjcraig commented 2 years ago

OK, so I ripped a blu-ray of Ragnarok and tried default (brew) ffmpeg using other_transcoding (latest version) and I got 77fps. Moving to head (as described above and tested by Casey) I got ~238fps - which is what I got on my 13" M1 MBP:

Verifying "ffprobe" availability... Verifying "ffmpeg" availability... Verifying "mkvpropedit" availability... Finding encoders... Trying "h264_videotoolbox" video encoder... Scanning media... Detecting crop... ... crop = 1920:804:0:138 duration = 02:10:30.72 Stream mapping: 0 = h264_videotoolbox / 6000 Kbps 1 = ac3 Command line: ffmpeg -loglevel error -stats -i MARVEL\'S\ THOR-\ RAGNAROK\ -\ BLU-RAY_t00.mkv -map 0:0 -filter:v crop\=1920:804:0:138 -c:v h264_videotoolbox -b:v 6000k -color_primaries:v bt709 -color_trc:v bt709 -colorspace:v bt709 -metadata:s:v title\= -disposition:v default -map 0:1 -c:a:0 ac3 -metadata:s:a:0 title\= -disposition:a:0 default -sn -metadata:g title\= -movflags disable_chpl MARVEL\'S\ THOR-\ RAGNAROK\ -\ BLU-RAY_t00.mp4 Transcoding... frame=39058 fps=238 q=-0.0 size= 1283072kB time=00:27:08.73 bitrate=6453.4kbits/s speed=9.94x

So this supports my theory that with this fix we just get to M1 performance. It seems only 8 GPU cores are available so whether to not that's due to a missing API call in VideoToolbox or some hard coded limit elsewhere remains to be seen. BUT - progress! Thanks to everyone who's been helping out on this.

NetRanger1967 commented 2 years ago

Many thanks to @wintervaler for the valuable tip. For info, I have a MacBook Pro (14-inch, 2021) with M1 Max 10C CPU/32C GPU/64GB/4TB. I managed to get x12.8 for transcoding H264 1080p to H265 1080p with Apple's HEVC_VideoToolBox.

image

In the same way I'm getting x2.7 when transcoding H265 4K content using Apple's HEVC_VideoToolBox.

image

I believe performance can be improved with the future versions of ffmpeg, especially for 4K content.

martinpickett commented 2 years ago

I feel like I need to add this warning, especially for anyone who is unclear what "git head" or "git master" means.

Installing FFmpeg using brew install ffmpeg --HEAD means you get the very latest version of FFmpeg. This means you get all the latest features (improved M1 Pro/Max support) but also all the latest bugs. This version of FFmpeg is experimental. It does not go through the same level of testing as a standard release version. This means there are bugs, sometimes serious bugs.

Having used this version of FFmpeg on and off for a couple of years I can tell you first hand that sometimes whole features which work in the release version do not work in the experimental version. Sometimes there are significant performance degradations. Sometimes it works great. Your milage will vary.

With that said, I am happy to see it currently working so well for so many of you. Looking at the FFmpeg repository the key update seems destined to be included in release version 5.0. If you can, I would recommend waiting for this release.

thudfactor commented 2 years ago

I have a baby M1 Pro compared to a lot of you and with ffmpeg --HEAD I am seeing encoding speeds approaching 8x, which is a huge improvement over the 2.5x I was getting in November. I might wait until it hits stable before trusting any of those encodes, just as @martinpickett says.

mattjcraig commented 2 years ago

Totally agree with @martinpickett that dev builds are to be taken with a grain of salt. This has mostly been a debugging exercise for me. And while it's certainly nice to have finally achieved my old M1 level of performance, I do believe there is a lot of performance to be gained. I suspect, however, that whereas the issues we've been discussing here was a bug, to get more gains will require some actual enhancements to take advantage of > 8 GPU cores.

samhutchins commented 2 years ago

I'm not sure the GPU cores affect encoding speed in any meaningful way. If the hardware encoder is anything like QSV or NVENC (and I'd guess that it is), it's just a separate IP block/circuit that's on the same die as the GPU, but it doesn't actually use much of the graphics processing. With nvenc, at least, any GPU in a generation encodes video at pretty much the same speed, assuming no other bottlenecks, so my GTX 1660 goes at the same pace as a 2080 Ti (and possibly the 30 series, I can't remember if the encoding block changed at all between those generations)

That's not to say there's not optimisations to be made, of course, just that I wouldn't expect there to be any differences in encoding speed between the lowest end M1 Pro and the highest end M1 Max, and I wouldn't be that surprised if it applied to the original M1 in the the Air as well

lisamelton commented 2 years ago

@samhutchins Yeah, I suspect you are correct, sir.

mattjcraig commented 2 years ago

Thanks @samhutchins - good to know. I was getting all excited with my naive/ignorant speculation about GPU cores :) Also, embarrassed to be complaining about ~240 fps encoding :)

martinpickett commented 2 years ago

In case you have not seen, FFmpeg 5.0 was released last week and it includes the improvements/bug fixes necessary to speed up M1 Max/Pro hardware encoding. Static binaries are available from the FFmpeg website, but I am not sure if the Homebrew recipe has been updated yet.

cliss commented 2 years ago

Thanks @martinpickett. Looks to me like Homebrew hasn't yet updated:

https://github.com/Homebrew/homebrew-core/blob/master/Formula/ffmpeg.rb#L4

Edit: For historical reference, this is the version that was active at the time the above link was cited.

cliss commented 2 years ago

Update; seems that Homebrew did indeed update yesterday:

https://github.com/Homebrew/homebrew-core/blob/master/Formula/ffmpeg.rb#L4

(For historical reference, here is the specific version of the above.)

I did a test by transcoding the same file twice with other-transcode; one with the HEAD version I pulled over a week ago, and one with the currently released version of ffmpeg that Homebrew is pointing to. Both encoded at about 10×, and their elapsed time was within 2 seconds of each other. 🎉

@donmelton, I suspect at this point we could close this issue?

lisamelton commented 2 years ago

@cliss Thanks for testing! And, yeah, I think we can close this now.