bin2c: Optimize with memory mapping and a LUT

F0bes commented 8 months ago

Uses memory mapping for input and output files.

There wasn't much performance increase on using the input file mapping, but there was a large one for the output file.

I optimized further by going away with sprintf for every byte (or every 16 bytes when unrolled) and incorporated a simple LUT. The memcpy is optimized by gcc/clang to around 7 instructions.

Here are some benchmarks tested on files generated with /dev/urandom: (Any lower than 100kB has no measurable difference)

100 kB file

master:
real    0m0.007s
user    0m0.007s
sys 0m0.000s

pr:
real    0m0.003s
user    0m0.000s
sys 0m0.001s

1.0 MB file

master:
real    0m0.064s
user    0m0.060s
sys 0m0.004s

pr:
real    0m0.009s
user    0m0.000s
sys 0m0.005s

10 MB file

master:
real    0m0.602s
user    0m0.567s
sys 0m0.033s

pr:
real    0m0.086s
user    0m0.010s
sys 0m0.052s

67 MB file (Unreasonable, but just wanted to show that it scales)

master:
real    0m8.336s
user    0m5.558s
sys     0m0.662s

pr:
real    0m1.276s
user    0m0.040s
sys     0m0.188s

I have a testing setup that compares the output the masters bin2c and my bin2c. I can for sure say that this version of bin2c produces output 100% identical with our current one.

I'm not sure how I could remedy the page faults. But these are the current hot spots.

F0bes commented 8 months ago

Here is the script I used to test

#! /bin/bash
total_old_time=0
total_new_time=0

for i in {1..100}
do
    size=$((100000 * i))
    dd if=/dev/urandom of=test_files/bin2c_test_file_$i bs=$size count=1
    chmod 777 test_files/bin2c_test_file_$i

    # Run old bin2c and get the real time
    old_time=$( { time $OLD_BIN2C test_files/bin2c_test_file_$i test_files/bin2c_test_file_global_$i.c bin2c_test_file_$i ; } 2>&1 | grep real | awk '{print $2}' )
    # Remove the 'm' character from the time
    old_time=${old_time//m/}
    # Add the time to the total
    total_old_time=$(awk "BEGIN {print $total_old_time + $old_time}")

    chmod 777 test_files/bin2c_test_file_global_$i.c

    # Run new bin2c and get the real time
    new_time=$( { time $NEW_BIN2C test_files/bin2c_test_file_$i test_files/bin2c_test_file_local_$i.c bin2c_test_file_$i ; } 2>&1 | grep real | awk '{print $2}' )
    # Remove the 'm' character from the time
    new_time=${new_time//m/}
    # Add the time to the total
    total_new_time=$(awk "BEGIN {print $total_new_time + $new_time}")

    chmod 777 test_files/bin2c_test_file_local_$i.c
done

# Calculate the average times
average_old_time=$(awk "BEGIN {print $total_old_time / 100}")
average_new_time=$(awk "BEGIN {print $total_new_time / 100}")

for i in {1..100}
do
    echo "Comparing test_files/bin2c_test_file_global_$i.c and test_files/bin2c_test_file_local_$i.c"
    colordiff test_files/bin2c_test_file_global_$i.c test_files/bin2c_test_file_local_$i.c
done

echo "Average time for old bin2c: $average_old_time"
echo "Average time for new bin2c: $average_new_time"

rm -rf test_files/*

Make a directory "test_files" and run it, making sure you set the $OLD_BIN2C and $NEW_BIN2C environment vars

It does 100 runs starting from 20kb to 1MB It also diffs (requires colordiff but you can change it to use diff)

On my machine this is the result Average time for old bin2c: 0.06917 Average time for new bin2c: 0.00949

rickgaiser commented 8 months ago

Do we really need to optimize a tool for speed when it already completes in 0.07s by average? Sometimes the most simple solution is the best one. KISS. Only optimize if needed, but I guess you needed it ;).

Anyway, I think we can merge it since it looks well tested. @fjtrujy I assume this code should also work on osx? On windows I don't think we have testers?

uyjulian commented 8 months ago

The only potential issue I see is SIGSEGV or SIGBUS when device becomes unavailable, but I think it's fine for this use since it is just the bin2c program that will receive the error.

So I think this is fine, lgtm

F0bes commented 8 months ago

Do we really need to optimize a tool for speed when it already completes in 0.07s by average? Sometimes the most simple solution is the best one. KISS. Only optimize if needed, but I guess you needed it ;).

Anyway, I think we can merge it since it looks well tested. @fjtrujy I assume this code should also work on osx? On windows I don't think we have testers?

I know, it's a little overkill for something like this. But it adds up when you have a lot of assets. I might look into writing a new bin2o. It'll probably just be a thin wrapper over objcopy. Having a object files reduces the (very very slow) compile times you can see when using large bin2c files.

fjtrujy commented 8 months ago

We had before a bin2o too, but I think is not worthy to have it, bin2c get cleaners makefile implementation I think

ijacquez commented 8 months ago

Do we really need to optimize a tool for speed when it already completes in 0.07s by average? Sometimes the most simple solution is the best one. KISS. Only optimize if needed, but I guess you needed it ;). Anyway, I think we can merge it since it looks well tested. @fjtrujy I assume this code should also work on osx? On windows I don't think we have testers?

I know, it's a little overkill for something like this. But it adds up when you have a lot of assets. I might look into writing a new bin2o. It'll probably just be a thin wrapper over objcopy. Having a object files reduces the (very very slow) compile times you can see when using large bin2c files.

I'd be interested in seeing a bin2o C implementation using the BFD library. A shell script wrapped around objcopy is certainly slow.

rickgaiser commented 8 months ago

bin2c is simple and safe to use. The generated c file is easy to check, and you can compile the c file in the same way you compile the rest of your project: same toolchain, same compile flags, etc...

bin2o can be a script if we really want it. This script can call bin2c + gcc. Then it would still be simple and safe to use.

Taking the shortcut of generating an object file directly is in my opinion in the category of "Premature Optimization Is the Root of All Evil". What time do you think it will save? At most you'll be embedding 32MiB to fill the entire EE RAM with your ELF + embedded objects. Worst case you'll save a few seconds of compile time.

fjtrujy commented 8 months ago

bin2c is simple and safe to use. The generated c file is easy to check, and you can compile the c file in the same way you compile the rest of your project: same toolchain, same compile flags, etc...

bin2o can be a script if we really want it. This script can call bin2c + gcc. Then it would still be simple and safe to use.

Taking the shortcut of generating an object file directly is in my opinion in the category of "Premature Optimization Is the Root of All Evil". What time do you think it will save? At most you'll be embedding 32MiB to fill the entire EE RAM with your ELF + embedded objects. Worst case you'll save a few seconds of compile time.

Indeed

ps2dev / ps2sdk

bin2c: Optimize with memory mapping and a LUT #538