Printing large integers causes significant slowdown of compilation and runtime performance under LLVM

ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.

https://ziglang.org

MIT License

33.82k stars 2.47k forks source link

Printing large integers causes significant slowdown of compilation and runtime performance under LLVM #20340

Open cryptocode opened 3 months ago

cryptocode commented 3 months ago

Zig Version

0.14.0-dev.32+4aa15440c

Core i9 on macOS 14.5

Steps to Reproduce and Observed Behavior

As integer literals get larger, compile times and runtime performance of printing them becomes very slow.

This happens under LLVM, but not when using native backends, at least not on x86_64.

Here's an extreme case that takes a very long time for LLVM Emit Object to complete (especially when compiling the program with -O Debug): https://zigbin.io/299ef7/raw

It takes around 10 seconds on a Core i9 to print the number at runtime even when compiled with -O ReleaseFast. Printing the number when compiled with the native backend is near instant.

Of course, numbers of that magnitude won't likely be found in the wild, but there are use cases for large integers and the slowdown will add up if they're used a lot.

LLVM produces this: https://clbin.com/0aX1E Native produces this: https://clbin.com/ouDY4

Expected Behavior

Compile speed and runtime performance on par with the native backend

JamesParrott commented 3 months ago

Numbers this large are essentially data. I had to sys.set_int_max_str_digits(5000) in Python to convert it to hex. I got bored waiting for it to compile as it is, if that counts as reproducing the issue. It took ages too using u65535, u32767 types, and a hex literal (the int is under Zig's maximum bit width).

However I noticed your program compiles and runs a lot faster in LLVM, using a string of its digits. So I'd look into using the hex to store it as bytes.

const std = @import("std");

pub fn main() !void {
    const a_big_int = "15132498712934891283498... ";
    std.debug.print("{s}\n", .{a_big_int});
}

mlugg commented 2 months ago

Numbers this large are essentially data.

All numbers are data. I do not understand what your comment is getting at.

The fact that the integer can be printed more efficiently if stored in an arbitrary unhelpful format is not in any way relevant. The performance issue here is an obvious upstream bug.

JamesParrott commented 2 months ago

I'm saying if arithmetic operations aren't going to be carried out on it, store data in a format optimised for data. There's room for great improvement here I agree. But it's a niche. Arbitrary-width integers and extremely wide integers are topics in their own right.

Try to build OP's code, by the way. You'll find the string literal to your suprise too, is an extremely helpful format in this situation. Unless you want to wait 5 minutes for it to compile.

cryptocode commented 2 months ago

There's room for great improvement here I agree. But it's a niche.

I don't think it's that niche as the upstream LLVM issue also affect much smaller numbers (the slowdown grows with the number of digits) which can compound into a problem. The code snippet is just a simple and obvious repro.

JamesParrott commented 2 months ago

It was a really good example. It just didn't pick a type for the numerical constant, it left that decision to Zig.

Is the issue with the LLVM IR that Zig emits, or with the executable LLVM generates from that IR?

cryptocode commented 2 months ago

Is the issue with the LLVM IR that Zig emits, or with the executable LLVM generates from that IR?

I don't see anything that should cause LLVM to spend a very long time in the optimizer/emitting, nor produce poor object code (but it does). So Zig's IR output seems reasonable to me, but the backend guys will have to chime in.

Here's the difference between using a small and big literal in Zig's LLVM IR output: https://www.diffchecker.com/Lwo74zSH/

Here's the difference after LLVM has munched on them under -O ReleaseFast: https://www.diffchecker.com/6oF5rpqZ/

Vexu commented 2 months ago

This regressed in LLVM 16 which moved from using __udivei4/__umodei4 to using a series of operations on ABI sized integers as is done for other arithmetic operations on large integers.

Compiling and running the example on Zig 0.10.1 which uses LLVM 15 takes a little over a second on my machine.

So it should be possible to get the performance back by explicitly calling __udivei4/__umodei4 in the LLVM backend.

mlugg commented 2 months ago

I think this should still keep the "upstream" label - we may be able to work around it in Zig, but it's objectively not our bug.

Vexu commented 2 months ago

Relevant upstream PR https://reviews.llvm.org/D130079

This also reproduces in C:

#define T _BitInt(65536)
T op(T a, T b) {
   // slow
   return a * b;
   // return a / b;
   // return a % b;

   // normal
   // return a + b;
}

mlugg commented 2 months ago

Upstream issue: https://github.com/llvm/llvm-project/issues/96488