llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.02k stars 11.57k forks source link

[llvm-dis] Disassemble files in parallel #108067

Open AmrDeveloper opened 1 week ago

AmrDeveloper commented 1 week ago

Currently the llvm disassembler receives one or more files and disassemble them one by one, and i think there is no specific need to perform that in specific order so why not perform disassembling in parallel.

https://github.com/llvm/llvm-project/blob/22067a8eb43a7194e65913b47a9c724fde3ed68f/llvm/tools/llvm-dis/llvm-dis.cpp#L190

If this idea is approved, i will be happy to work on it 😄

boomanaiden154 commented 1 week ago

Is there a reason it would have to be parallelized inside llvm-dis rather than doing the parallelization at the process level by invoking llvm-dis multiple times? Every time I have needed parallelization for bitcode disassembly I've just invoked multiple llvm-dis processes.

AmrDeveloper commented 1 week ago
boomanaiden154 commented 6 days ago

When you have many number of files why iterate and invoke multiple time when you can just run llvm dis and pass all files once.

It already supports multiple files, as you mentioned. The question is whether or not it should support parallelism.

Simplify lit testing depending on llvm dis so just one run to invoke

If we need to disassemble multiple files, we can already do so today. Same as above. Doing it in parallel inside lit I do not think is a good idea though. See below.

Faster CI withut caring about multiple invokation

I doubt runtime within llvm-dis has any appreciable impact on total runtime of the test suite. llvm-lit also manually handles threading, similar to a build system. Having llvm-dis take advantage of parallelism has the potential to increase scheduler contention, which would increase test time rather than decreasing it.

If we can make it parallel why not doing that and delegate this to user level, its already support multi files?

Is there an actual use case for that? If you have numbers for a specific use case where multiple invocations doesn't work/doesn't make a lot of sense and parallelism significantly decreases the total runtime, then I could see the case being slightly stronger.

There is not a strong precedent in compiler tooling to make it inherently multithreaded. The only multithreaded tools in the repo I think are clangd and lld. Some parallelism in lld related to ThinLTO (local ThinLTO) is even turned off quite often in favor of letting the build system manually handle the details (distributed ThinLTO).

At the end of the day, I don't think there's a strong use case for this. I haven't ever had an issue that made me want llvm-dis to be multithreaded, and I have probably run llvm-dis over multiple TB of bitcode at this point.