yuygfgg / Macos_vapoursynth_plugins

A repository holding commonly used vapoursynth plugins built for Apple Silicon Macos.
GNU General Public License v3.0
2 stars 0 forks source link

Including your optimizations in my build repo #1

Open Stefan-Olt opened 1 week ago

Stefan-Olt commented 1 week ago

Great to see that you have ported/optimized quite a lot of plugins to macOS! Especially porting VCL2 to Neon is great. Have you tried to open pull requests for the plugins, so that your changes will be part of the original code, so that you don't have to adopt it once a new version comes out? I have one open for znedi3: https://github.com/sekrit-twc/znedi3/ and the ones for nnedi3 and MVTools are already integrated, so they both compile optimization on macOS directly. In case the plugin is not actively maintained or the author does not want to include the ARM optimizations, I think the best is to have the patch included in my build file, so that I still use the same source vsrepo does.

I made some progress with my build repository: https://github.com/Stefan-Olt/vs-plugin-build Most of them are already installable with vsrepo. But as you see not only are a lot of plugins still missing, there are also quite a lot that do not compile correctly on ARM or explicitly do not support ARM. Some issues also result from the build system used not being really platform-independent, my experience meson/ninja works best. I have not made much effort to fix that yet, as I have started with those plugins that just work. It would be great if you could help me fix the Apple Silicon issues. My goal is to build the plugins as compatible as possible, meaning with all dependencies statically linked in, compatible with older macOS and not relying on anything else than VapourSynth itself

yuygfgg commented 1 week ago

I'm glad you're interested in my work. You can find the ported code on my profile, though some ports might not be fully compatible with x86. I'm not very motivated to make PRs because I don't have the time to test on other systems, and I don't own those devices. I usually install plugins as needed for my projects, simply to keep working with VapourSynth scripts. Many plugins (~80%) are no longer maintained, so PRs might not get responses. For active projects, including a large sse2neon.h file might confuse developers, since it's not something familiar to x86 developers and not widely used.

(I have ported like 20+ plugins, and if I open a PR for each of them, I won't have time for my encoding work)

I do have some PRs merged. For example, neo_f3kdb.

My suggestion is to fork those plugins but only PR the hand written neons and compiling failure patches. sse2neon SIMD port can be maintained independently (also include some avx only plugins, like bm3dcuda, which are ported by simde library) Since most plugins don't release frequently and latest version isn't always necessary, maintaining separately seems to be acceptable to me.

Regarding the static link, it's simply too much work for me. Perhaps we can do it in the apple suggested way, having dedicated dylibs for them. In fact, the local-lib folder in this repo is for the dependencies. (install_name_tool is useful)

A sad guess is that you and me are the only 2 persons doing serious encoding work on Macs (Nobody can stand encoding without simd support I think) (By serious I mean encoding >5 episodes a week and have less than 3fps), so from my perspective, getting things work (which means port&optimize as much plugins as possible) is far more important than having elegant codebases.

Sorry for somehow offtrack in the previous paragraph. Anyway, I'm very supportive to linux/macos vsrepo and excited to see and help its develop.

Stefan-Olt commented 1 week ago

Yes, I have seen the ported code. Of course for pull requests you need to ensure that support for other platforms don't break. I'm not sure if sse2neon.h really confuses developers, I find all the #ifdef more confusing, as they make the code harder to read (but of course they are needed). I also think that many plugins are not actively maintained anymore and you probably won't get an response to a pull request, but it's worth a try (for mvtools I became maintainer that way...). I'll see if I can take your work, check if it's still works on x86 and open pull requests. Static linking is not a big deal, you just have to compile dependencies like fftw or libavcodec as static instead of installing them using brew. I thought dynamic linking on Mac is a problem, because usually applications are statically linked. That's way projects like macdylibbundler exist.

I also don't do encoding on a Mac (I don't even own a Mac), but I have a friend that produces Videos on a Mac and uses VapourSynth for some automated deinterlacing/denoising/upscaling of older footage in much better quality than Final Cut. But there have been more 6000 installs of VapourSynth using brew in just the last month, so it's not that unpopular.

Are you sure this works in VCL2?

    // Assume NEON support on ARM
    output[0] = 1; // Indicating support for SSE
    output[1] = 1; // Indicating support for SSE2
    output[2] = 1; // Indicating support for SSE3
    output[3] = 1; // Indicating support for SSSE3, SSE4.1, SSE4.2

The original code copies the 4 registers to output, there are many flags with special meaning. Probably it is needed to figure out where/how the function is used and return the correct flags. I think a value of 1 (only lowest flag set) is indicating that no SIMD instruction set is available. See here for all flags: https://en.wikipedia.org/wiki/CPUID

Edit: I looked how VCL2 uses that function, have you tried setting it like this:

    output[0] = 1;
    output[1] = 1;
    output[2] = (1 <<  0)|(1 <<  9)|(1 << 19)|(1 << 23)|(1 << 20);
    output[3] = (1 <<  0)|(1 << 23)|(1 << 15)|(1 << 24)|(1 << 25)|(1 << 26);

This should indicate everything up to SSE4.2

yuygfgg commented 1 week ago

Yes, I have seen the ported code. Of course for pull requests you need to ensure that support for other platforms don't break. I'm not sure if sse2neon.h really confuses developers, I find all the #ifdef more confusing, as they make the code harder to read (but of course they are needed). I also think that many plugins are not actively maintained anymore and you probably won't get an response to a pull request, but it's worth a try (for mvtools I became maintainer that way...). I'll see if I can take your work, check if it's still works on x86 and open pull requests. Static linking is not a big deal, you just have to compile dependencies like fftw or libavcodec as static instead of installing them using brew. I thought dynamic linking on Mac is a problem, because usually applications are statically linked. That's way projects like macdylibbundler exist.

I also don't do encoding on a Mac (I don't even own a Mac), but I have a friend that produces Videos on a Mac and uses VapourSynth for some automated deinterlacing/denoising/upscaling of older footage in much better quality than Final Cut. But there have been more 6000 installs of VapourSynth using brew in just the last month, so it's not that unpopular.

Are you sure this works in VCL2?

    // Assume NEON support on ARM
    output[0] = 1; // Indicating support for SSE
    output[1] = 1; // Indicating support for SSE2
    output[2] = 1; // Indicating support for SSE3
    output[3] = 1; // Indicating support for SSSE3, SSE4.1, SSE4.2

The original code copies the 4 registers to output, there are many flags with special meaning. Probably it is needed to figure out where/how the function is used and return the correct flags. I think a value of 1 (only lowest flag set) is indicating that no SIMD instruction set is available. See here for all flags: https://en.wikipedia.org/wiki/CPUID

Edit: I looked how VCL2 uses that function, have you tried setting it like this:

    output[0] = 1;
    output[1] = 1;
    output[2] = (1 <<  0)|(1 <<  9)|(1 << 19)|(1 << 23)|(1 << 20);
    output[3] = (1 <<  0)|(1 << 23)|(1 << 15)|(1 << 24)|(1 << 25)|(1 << 26);

This should indicate everything up to SSE4.2

Thank you very much for helping with cpuid. I will take a deeper look.

It's not a serious problem though. If you check instrset_detect.cpp, you will find that cpuid is not important.

截屏2024-10-12 06 40 59
Stefan-Olt commented 1 week ago

Yes, I see you also modified the instrset_detect function, so it should indeed not be relevant. I have copied your TCanny changes so that also works with x86 (and also included a gcc build fix), on a M1 Max this gives more than double the speed. I opened a pull request, we'll see if there is any reaction.

yuygfgg commented 1 week ago

Yes, I see you also modified the instrset_detect function, so it should indeed not be relevant. I have copied your TCanny changes so that also works with x86 (and also included a gcc build fix), on a M1 Max this gives more than double the speed. I opened a pull request, we'll see if there is any reaction.

If holy replies, we can ask him directly if he will accept 20+ similar ports in addition. He should have writing access to a lot of plugin's repos.

Stefan-Olt commented 1 week ago

That would be great. Too bad Agner Fog has rejected to add sse2neon to VCL2 directly, because he has no machines to test it

yuygfgg commented 1 week ago

That would be great. Too bad Agner Fog has rejected to add sse2neon to VCL2 directly, because he has no machines to test it

Also I see VCL2 version updated in your tcanny pr. I think modify base on the original version of the projects will be better because in this way we only port and fix build, not introducing extra change.

Stefan-Olt commented 1 week ago

No, I only changed instrset.h and instrset_detect.cpp. Probably a different version number, because I copied them from another place.

yuygfgg commented 1 day ago

In fact, I don't think adding patches into json file is the best practice. It damages the readability (especially considering the base64 and the 10k lines sse2neon.h)

Maybe we can use similar measure like brew (--HEAD or external taps) to allow users choose optimized versions. :D

PS. Have you tried to replace mmintrins with sse2neon and manually define x86 sse MARCOs and ENV varibles? Can it be a universal solution?

yuygfgg commented 12 hours ago

I just found something great!

https://github.com/kojirou1994/Builder

It seems like a vapoursynth plugin build system using Swift, and probably has good macos support.

Stefan-Olt commented 12 hours ago

In fact, I don't think adding patches into json file is the best practice. It damages the readability (especially considering the base64 and the 10k lines sse2neon.h)

Maybe we can use similar measure like brew (--HEAD or external taps) to allow users choose optimized versions. :D

Hmm, my intention is to provide builds for vsrepo, so allowing the user to select something is difficult. I agree that a patch inside the json is not a nice solution, but I cannot think of something better. I don't want to change the repository itself, because this way I can update the plugin automatically.

PS. Have you tried to replace mmintrins with sse2neon and manually define x86 sse MARCOs and ENV varibles? Can it be a universal solution? No, because sse2neon does not implement cpuid and other non-sse intrinsics, so that won't work. Additionally it cannot convert any AVX instructions (NEON and SSE are both 128 bit, but AVX is 256 bit, so it's also not coming in later version of sse2neon)

I just found something great!

https://github.com/kojirou1994/Builder

It seems like a vapoursynth plugin build system using Swift, and probably has good macos support.

Seems indeed interesting, but I don't know swift at all. Also some plugins are outdated like mvtools. I think that is a common issue when you fork: You don't get new versions. My system is designed for automatic updates.