Closed brson closed 12 years ago
Thanks @marijnh for pointing it out
Here's what perf says with an optimized runtime and corelib:
10.64% test libcore-d27e4777a53c3e50-0.2.so [.] uint::to_str::_f7f4236dcee4291a::_02
9.34% test libc-2.15.so [.] _int_free
7.09% test libc-2.15.so [.] __memset_x86_64
6.90% test librustrt.so [.] upcall_s_vec_grow
6.01% test librustrt.so [.] upcall_vec_grow
5.07% test librustrt.so [.] exchange_malloc
4.92% test librustrt.so [.] upcall_exchange_free
4.72% test librustrt.so [.] upcall_exchange_malloc_dyn
4.43% test librustrt.so [.] check_stack_canary(stk_seg*)
4.19% test libc-2.15.so [.] malloc
4.11% test libc-2.15.so [.] _int_malloc
3.07% test librustrt.so [.] upcall_s_exchange_free
3.03% test librustrt.so [.] check_stack_alignment
3.00% test librustrt.so [.] upcall_s_exchange_malloc_dyn
2.18% test librustrt.so [.] memory_region::malloc(unsigned long, char const*, bool)
2.15% test libc-2.15.so [.] realloc
1.97% test [kernel.kallsyms] [k] 0xffffffff8103d0ca
1.91% test librustrt.so [.] get_sp_limit
1.82% test librustrt.so [.] memory_region::free(void*)
1.53% test libc-2.15.so [.] _int_realloc
1.33% test librustrt.so [.] __morestack
1.31% test librustrt.so [.] get_sp
1.19% test librustrt.so [.] upcall_call_shim_on_c_stack
0.95% test libc-2.15.so [.] free
0.89% test librustrt.so [.] memory_region::add_alloc()
0.68% test librustrt.so [.] upcall_str_new_uniq
It looks like there's a lot to be gained just in optimizing uint::to_str. It is very inefficient.
Often in such benchmarks the random number generator is the bottleneck. Can someone point out where this shows up in the analysis above?
I recently added the xorshift random number generator. It's not on by default, but it should be significantly faster than the default ISAAC generator.
https://github.com/mozilla/rust/commit/ad292a8c73a0cceddfa9618a4d6eea577897bae8
The biggest win is going to be to write a uint::write
or something that directly writes to a file instead of allocating.
6e0085210c54150f794d20791b2e9c1fda6049fc makes uint::to_str
not allocate so much, though it still does one extra allocation when it creates the initial empty vector.
That commit makes the time for time ./test 10000000
go from 32s to 6s.
Graydon made another commit to improve it further. I believe we are competitive with the other languages now.
Somewhat related is #2105
http://blog.cdleary.com/2012/06/simple-selfish-and-unscientific-shootout/
We had a particularly poor showing in this comparison.