vapoursynth / vsrepo

A simple package repository for VapourSynth
MIT License
113 stars 29 forks source link

Can a GitHub release without binaries cause problems? #225

Closed Stefan-Olt closed 4 months ago

Stefan-Olt commented 4 months ago

Hi, I committed a patch to nnedi3 to fix building on aarch64 / Apple Silicon. This does not change anything for Windows (until there is a Windows ARM build). Is it a problem for vsrepo to create a new GitHub release without any new Windows binaries? It's currently v12, I would release for example v12.1 or v12a

myrsloik commented 4 months ago

Short answer: no. A release can always be manually blacklisted if it's a real problem.

Btw, why isn't the world using znedi3 now?

Stefan-Olt commented 4 months ago

I wasn't able to compile znedi3 on Apple Silicon back then and the issues seemed to be bigger than the simple fix for nnedi3. Also it does not contain NEON assembly, just SSE/AVX, so no idea if it's faster on ARM than traditional nnedi3 (that has NEON assembly) . But I will give it another try and use sse2neon to see if it outperforms nnedi3 with the hope that the binary will at some point be distributed by vsrepo

Stefan-Olt commented 4 months ago

Btw, why isn't the world using znedi3 now?

I compared nnedi3 to znedi3 (2x upscaling HD to 4K with a real sample, repeated in random order to verify consistency):

macOS 14 on Apple M1 Max:
nnedi3:                 66 fps
znedi3:                 16 fps
znedi3 (with sse2neon): 68 fps

Ubuntu 22.04 on Ryzen 9 5900X:
nnedi3:                 123 fps
znedi3:                 196 fps

While on x86 there is a massive improvement, on ARM it's almost down the noise. Even though sse2neon gives some impressive speedups to znedi3 (more than 4x), it's just barely faster than nnedi3 with it's native NEON instructions. I'm not sure how much more potential there is by writing native NEON, because on x86 there are the AVX instructions that are used by znedi3, but not nnedi3. AFAIK the SVE/SVE2 instructions on ARM are not supported on consumer hardware yet (not on Apple Silicon, not Snapdragon X, not on RPi)