Closed F0bes closed 8 months ago
Here is the script I used to test
#! /bin/bash
total_old_time=0
total_new_time=0
for i in {1..100}
do
size=$((100000 * i))
dd if=/dev/urandom of=test_files/bin2c_test_file_$i bs=$size count=1
chmod 777 test_files/bin2c_test_file_$i
# Run old bin2c and get the real time
old_time=$( { time $OLD_BIN2C test_files/bin2c_test_file_$i test_files/bin2c_test_file_global_$i.c bin2c_test_file_$i ; } 2>&1 | grep real | awk '{print $2}' )
# Remove the 'm' character from the time
old_time=${old_time//m/}
# Add the time to the total
total_old_time=$(awk "BEGIN {print $total_old_time + $old_time}")
chmod 777 test_files/bin2c_test_file_global_$i.c
# Run new bin2c and get the real time
new_time=$( { time $NEW_BIN2C test_files/bin2c_test_file_$i test_files/bin2c_test_file_local_$i.c bin2c_test_file_$i ; } 2>&1 | grep real | awk '{print $2}' )
# Remove the 'm' character from the time
new_time=${new_time//m/}
# Add the time to the total
total_new_time=$(awk "BEGIN {print $total_new_time + $new_time}")
chmod 777 test_files/bin2c_test_file_local_$i.c
done
# Calculate the average times
average_old_time=$(awk "BEGIN {print $total_old_time / 100}")
average_new_time=$(awk "BEGIN {print $total_new_time / 100}")
for i in {1..100}
do
echo "Comparing test_files/bin2c_test_file_global_$i.c and test_files/bin2c_test_file_local_$i.c"
colordiff test_files/bin2c_test_file_global_$i.c test_files/bin2c_test_file_local_$i.c
done
echo "Average time for old bin2c: $average_old_time"
echo "Average time for new bin2c: $average_new_time"
rm -rf test_files/*
Make a directory "test_files" and run it, making sure you set the $OLD_BIN2C and $NEW_BIN2C environment vars
It does 100 runs starting from 20kb to 1MB
It also diffs (requires colordiff
but you can change it to use diff
)
On my machine this is the result Average time for old bin2c: 0.06917 Average time for new bin2c: 0.00949
Do we really need to optimize a tool for speed when it already completes in 0.07s by average? Sometimes the most simple solution is the best one. KISS. Only optimize if needed, but I guess you needed it ;).
Anyway, I think we can merge it since it looks well tested. @fjtrujy I assume this code should also work on osx? On windows I don't think we have testers?
The only potential issue I see is SIGSEGV or SIGBUS when device becomes unavailable, but I think it's fine for this use since it is just the bin2c program that will receive the error.
So I think this is fine, lgtm
Do we really need to optimize a tool for speed when it already completes in 0.07s by average? Sometimes the most simple solution is the best one. KISS. Only optimize if needed, but I guess you needed it ;).
Anyway, I think we can merge it since it looks well tested. @fjtrujy I assume this code should also work on osx? On windows I don't think we have testers?
I know, it's a little overkill for something like this. But it adds up when you have a lot of assets. I might look into writing a new bin2o. It'll probably just be a thin wrapper over objcopy. Having a object files reduces the (very very slow) compile times you can see when using large bin2c files.
We had before a bin2o too, but I think is not worthy to have it, bin2c get cleaners makefile implementation I think
Do we really need to optimize a tool for speed when it already completes in 0.07s by average? Sometimes the most simple solution is the best one. KISS. Only optimize if needed, but I guess you needed it ;). Anyway, I think we can merge it since it looks well tested. @fjtrujy I assume this code should also work on osx? On windows I don't think we have testers?
I know, it's a little overkill for something like this. But it adds up when you have a lot of assets. I might look into writing a new bin2o. It'll probably just be a thin wrapper over objcopy. Having a object files reduces the (very very slow) compile times you can see when using large bin2c files.
I'd be interested in seeing a bin2o
C implementation using the BFD library. A shell script wrapped around objcopy
is certainly slow.
bin2c is simple and safe to use. The generated c file is easy to check, and you can compile the c file in the same way you compile the rest of your project: same toolchain, same compile flags, etc...
bin2o can be a script if we really want it. This script can call bin2c + gcc. Then it would still be simple and safe to use.
Taking the shortcut of generating an object file directly is in my opinion in the category of "Premature Optimization Is the Root of All Evil". What time do you think it will save? At most you'll be embedding 32MiB to fill the entire EE RAM with your ELF + embedded objects. Worst case you'll save a few seconds of compile time.
bin2c is simple and safe to use. The generated c file is easy to check, and you can compile the c file in the same way you compile the rest of your project: same toolchain, same compile flags, etc...
bin2o can be a script if we really want it. This script can call bin2c + gcc. Then it would still be simple and safe to use.
Taking the shortcut of generating an object file directly is in my opinion in the category of "Premature Optimization Is the Root of All Evil". What time do you think it will save? At most you'll be embedding 32MiB to fill the entire EE RAM with your ELF + embedded objects. Worst case you'll save a few seconds of compile time.
Indeed
Uses memory mapping for input and output files.
There wasn't much performance increase on using the input file mapping, but there was a large one for the output file.
I optimized further by going away with
sprintf
for every byte (or every 16 bytes when unrolled) and incorporated a simple LUT. Thememcpy
is optimized by gcc/clang to around 7 instructions.Here are some benchmarks tested on files generated with /dev/urandom: (Any lower than 100kB has no measurable difference)
I have a testing setup that compares the output the masters bin2c and my bin2c. I can for sure say that this version of bin2c produces output 100% identical with our current one.
I'm not sure how I could remedy the page faults. But these are the current hot spots.