nzbgetcom / nzbget

Efficient usenet downloader
https://nzbget.com
GNU General Public License v2.0
361 stars 17 forks source link

unrar NEON - linux arm64 #109

Closed woiza closed 10 months ago

woiza commented 1 year ago

Hi,

currently prebuilt unrar packages only support Intel AES-NI and arm64 NEON for Android. Thus, unpacking encrypted rar files on Linux arm devices is very slow since no hardware acceleration is utilized. However, recent unrar sources support NEON if built with some minor changes and the following project:

https://github.com/DLTcollab/sse2neon

I tested this on a Radxa Rock 5b SBC and a Sandisk SSD. Unpacking a 2.4GB encrypted rar archivie was four times faster. Here are the necessary changes to the unrar source code:

https://github.com/pmachapman/unrar/compare/master...woiza:unrar:neon

Do you think you could include unrar with NEON enabled to nzbget-ng?

paul-chambers commented 1 year ago

Actually, I'm thinking of switching nzbget over to using libarchive (https://www.libarchive.org/). Nzbget currently invokes external executables for both unrar and 7zip, and that brings some security problems that are difficult to address completely.

I don't know if libarchive uses the NEON coprocessor instructions on ARM architectures. Could you check for me?

woiza commented 1 year ago

According to their documentation, libarchive can be build against OpenSSL. Do you know of any open source projects / apps which can unpack rar and zip files using libarchive?

paul-chambers commented 1 year ago

libarchive comes with executables, bsdtar and bsdcpio. Their front page has this to say about them:

  • Reads a variety of formats, including tar, pax, cpio, zip, xar, lha, ar, cab, mtree, rar, and ISO images.
  • Writes tar, pax, cpio, zip, xar, ar, ISO, mtree, and shar archives.
  • Automatically handles archives compressed with gzip, bzip2, lzip, xz, lzma, or compress.
  • Unique format conversion feature.

Since I haven't actually tried to use libarchive yet, I don't know how well it'll work in practice. I need to understand the post-processing side better before making significant changes.

Philosophically, I do question whether nzbget should be doing all the post-processing within a monolithic codebase. I suspect things may be cleaner and simpler if nzbget focused on retrieving files as described by the NZB (including the par2 processing), and then hands it off to a second daemon to handle any post-processing of those files (unzip/unrar, etc.)

Bec-de-Xorbin commented 1 year ago

@woiza I guess this NEON stuff won't work on armv7l? At least I can't get it to compile... Unpacking passworded rars are unbelieveably slow.

paul-chambers commented 1 year ago

Depends on the specific processor. Some families lack a hardware FPU completely; all the floating point processing is emulated in software (a.k.a 'softFPU').

NEON is an optional SIMD coprocessor that supports both integer and floating-point operations. It can massively speed up some kinds of processing by applying the same operation on multiple data items in parallel. If you don't have NEON support on your CPU, there's a decent chance you do not have a hardware FPU.

I'd also check that the compiler is being told correctly what your particular CPU provides. Since NEON is an optional part of the ARM architecture, the compiler needs to be told that it's available on the target it's being told to compile for. Ditto for a hardware FPU.

There are multiple reasons for CPU-bound processing to be 'unbelievably slow' on embedded devices. For example, cost constraints can limit DRAM's speed and bus bandwidth. The CPU cache may be small. The CPU clock may be relatively low. Thermal constraints may also limit sustained performance. Rarely, the CPU clock is lowered to reduce the amount of EMI emitted (or harmonics interfering with WiFi, BT, etc.).

For networking-oriented SoCs, in particular, a hardware-accelerated networking engine usually handles the vast majority of the performance-critical processing. This is why they can route packets at 'wire speed', but as soon as every packet needs to be processed by the CPU (e.g. for VPN support), the throughput drops so much.

woiza commented 1 year ago

@Bec-de-Xorbin Correct, only armv8 supports hardware accelerated en- and decryption (part of NEON), except for Raspberry Pis… Here‘s a great comparison of the most popular SBCs/SoCs („AES“ column):

https://github.com/ThomasKaiser/sbc-bench/blob/master/Results.md

virtuallynathan commented 1 year ago

https://github.com/animetosho/par2cmdline-turbo

Might be the easiest change… quite a bit faster.

luckedea commented 10 months ago

Closing this because of already open issue about the same problem: https://github.com/nzbgetcom/nzbget/issues/93 par2cmdline-turbo and https://github.com/nzbgetcom/nzbget/issues/78 unrar