Closed gsr933 closed 1 year ago
I would suggest that with hardware-specific optimizations existing github worflows should be extended with regression checks for ARM-compiled code. I am using qemu
for that, but whatever works for you.
I would suggest that with hardware-specific optimizations existing github worflows should be extended with regression checks for ARM-compiled code. I am using
qemu
for that, but whatever works for you.
Added in 52fe7fa
Thank you for the PR! I clicked the button that says "Approve and run" the workflow.
I seem to be a zfec maintainer, and I can't speak for other maintainers, but: I do not know how to read ARM assembly, and I am not going to have the time to learn it any time soon. I don't even know what ARM Neon is, and how it is different from regular ARM! But it might be useful if you could explain a few things:
The primary consumer of zfec is Tahoe-LAFS project, which maintains zfec. As you can see there's just enough activity in zfec to keep it going to suit Tahoe-LAFS project's needs, because we do have bandwidth constraints. How could this PR help Tahoe-LAFS project? I do not think we have ever heard from anyone using Tahoe-LAFS on ARM Neon.
Why are you adding this to zfec? Who will it help? Zfec hasn't had any inline assembly up to now, and I am apprehensive about adding it now.
What about ongoing maintenance? Who will help us fix issues with ARM Neon bits if/when they are uncovered in the future?
We have been talking about moving Tahoe-LAFS org to GitLab. So another concern I have is that we will have to do extra work if/when we set up CI there.
Is it important to you that this is merged?
@sajith the code sits behind a feature-flag so it wont have any impact on existing users of the module, unless they want hardware acceleration. It's very common for FEC libraries to use hardware acceleration as they are well suited to benefit from dedicated CPU instructions like SIMD and ADDMUL, much like AES hardware acceleration for cryptography.
@ribbles Since users of Tahoe-LAFS haven't asked that zfec be sped up, speeding up zfec isn't a priority right now. If we absolutely must speed zfec up, we should probably consider something like OpenMP first, and let compilers handle optimizations. But even that ever has not been a priority.
The existing pieces in C itself is hard to understand, and adding assembly would complicate it further. I personally wouldn't be the one to click "the merge PR" button, because I don't know how to review this code and it would be irresponsible of me to merge code that I do not understand. It would be especially irresponsible because I don't spend much time on Tahoe-LAFS these days. There is a maintenance cost to adding any new code, especially assembly, and I don't want to burden other maintainers. Unless they want to burden themselves, of course. Which they seem reluctant to do.
I brought this PR up in Tahoe-LAFS IRC channel. The consensus is that this PR is going to be hard to merge, without spending considerable time and effort in reviewing the code.
Zfec's git history stars in 2007, and it used other version control systems before. No one has added hand-written AMD/Intel SIMD optimizations all those years. If anyone is going to do that, I hope they have a persuasive case, and that they try to persuade before they spend any effort with a PR.
Another power of free and open source repositories is that you can maintain a fork yourself. :-)
When zfec
was added to Pypi it became something much bigger than just the Tahoe-LAFS project. zfec
pip downloads last month were 2,863 vs. 1,481 for Tahoe-LAFS
- however maybe that's not representative.
Even if zfec turns out be far more popular and important than Tahoe-LAFS, and even if zfec maintainers (hypothetically) could be convinced that adding assembly to zfec is a good idea, it remains that somebody has to review and merge this PR. That somebody is not me. I am not qualified to review this PR.
The
simd
branch adds, for now, ARM NEON assembly for_addmul1
, enabled by--with-arm-neon
insetup.py
.Benchmark results on a 1 GHz Cortex-A8 (AM335x, Beaglebone Black) Without Neon:
With Neon: