odin-lang / Odin

Odin Programming Language
https://odin-lang.org
BSD 3-Clause "New" or "Revised" License
7k stars 621 forks source link

`big.Int` is slower than expected #4311

Open VincentLagerros opened 1 month ago

VincentLagerros commented 1 month ago

Context

    Odin:    dev-2024-09-nightly:dd1f151
    OS:      Windows 10 Home Basic (version: 22H2), build 19045.4894
    CPU:     AMD Ryzen 5 4600G with Radeon Graphics
    RAM:     15716 MiB
    Backend: LLVM 18.1.8

Expected Behavior

Fast big.mul big.div and big.int_itoa_string

Current Behavior

big.mul, big.div and big.int_itoa_string is an order of magnitude slower than they should be. Just comparing the generation of catlan numbers up to 20000 and python3 is 10x faster.

Failure Information (for bugs)

Steps to Reproduce

I wrote a solution to https://open.kattis.com/problems/catalan in both python3 and odin to compare. The logic should be the same, but odin is way slower both on my local machine and on the online judge.

// https://open.kattis.com/problems/catalan
package main

import "core:fmt"
import "core:math/big"
import "core:os"
import "core:strconv"
import "core:strings"

main :: proc() {
    data, ok := os.read_entire_file(os.stdin)
    assert(ok)
    text := strings.trim_space(string(data))
    last := big.Int{}
    big.atoi(&last, "1")
    items := [dynamic]big.Int{last}

    defer delete(items)
    defer delete(text)
    defer big.destroy(&last)

    for x in strings.split_lines(text)[1:] {
        i := strconv.atoi(x)
        for i >= len(items) {
            n := len(items)
            big.mul(&last, &last, big.DIGIT(4 * n - 2))
            big.div(&last, &last, big.DIGIT(n + 1))
            save := big.Int{}
            big.copy(&save, &last)
            append(&items, save)
        }
        str, err := big.int_itoa_string(&items[i])
        defer delete(str)
        assert(err == nil)
        fmt.println(str)
    }
}
# https://open.kattis.com/problems/catalan
last = 1
items = [1]
for x in range(int(input())):
    i = int(input())
    while i >= len(items):
        n = len(items)
        last *= (4 * n - 2)
        last //= (n + 1)
        items.append(last)
    print(items[i])

Here is a basic testcase

7
1
2
3
4
5
50
20000
JesseRMeyer commented 1 month ago

What were your Odin build settings?

VincentLagerros commented 1 month ago

What were your Odin build settings?

Not specified, only odin build . Is there some release flag I am unaware of, because I don't find any in build -help

JesseRMeyer commented 1 month ago

-o:speed will produce an optimized Odin binary. odin help build is the command you were looking for.

gingerBill commented 1 month ago

Just to make sure, try doing sys.stdout.flush() after your print in Python.

Printing in a test isn't a "benchmark" is not a good idea ever.

VincentLagerros commented 1 month ago

Neither printing nor -o:speed changes the result, it is still a magnitude slower. For reference when comparing py and odin I get odin: 14.502 s vs py: 1.206 s on the same test.

gingerBill commented 1 month ago

I am also wondering if the python example is even doing big ints in the first place. Yes integers in Python can be big ints but they do a lot of optimizations to not use them until needed.

VincentLagerros commented 1 month ago

No, python cant optimize this with small integers as catlan numbers grow really large really fast. Just n=37 will use more than bits than can fit into a u64 and the code is calculating for n=60000. I have written my own bigint in rust that is very much inspired by num_bigint, and it does not have this problem either.

VincentLagerros commented 1 month ago

Just as an example how the stdlib of odin does not seam to be optimized the _itoa_raw_full does a a division by radix for every char.

image

This is way slower than dividing by radix^x where radix^x is the max number that can fit into DIGIT. Then you can take that digit and do regular mul/div to get like 10 digits per long division instead of 1.

https://github.com/rust-num/num-bigint/blob/master/src/biguint/convert.rs#L699

gingerBill commented 1 month ago

To be clear, we have not optimized the core:math/big library whatsoever. It's a direct port of libTomMath with numerous bug fixes and improves to the original codebase.

I was just asking basic questions earlier because I wanted to see if you were actually testing the correct thing or not. It's not always obvious to tell.

JesseRMeyer commented 1 month ago

Also, we accept high quality PRs that improve the status quo!

VincentLagerros commented 1 month ago

That makes sense, but I doubt I can do a high quality PR when I have only been programming in odin for a few hours. However I can say that if you guys want to implement a really fast big int, then take a look at rug/gmplib.

Kelimion commented 1 month ago

We can't. They're under the GPL.