Closed ta0kira closed 3 years ago
Initial tests look good when creating an array in-place in a dynamically-allocated unsigned char[]
. No memory leaks with valgrind
. But, I haven't gotten into pool-managed memory yet. It's still possible for this to drastically increase compile times, due to templates.
A possible low-cost prototyping option:
Looks like thread_local
only mostly works: There needs to be a thread_local
cleanup so that a thread's pool doesn't leak, but there seems to be a race-condition with destructor-based thread_local
cleanup.
I did some informal testing with PrimesDemo
and there's roughly a 6.5% performance improvement. (Based on the number of primes computed in 0.1s.)
Using a std::atomic_flag
as a spinlock (39dd338c826bde4f4eb8e229ed3666321151e208) had a massive improvement, compared to the initial improvement when I was using the pointer to spinlock.
Test program.
concrete SortingTest {
@type run () -> ()
}
define SortingTest {
run () {
Vector<Int> values <- Vector:create<Int>()
traverse (Counter.zeroIndexed(100000) -> Int i) {
\ values.push((i*1000000009)%10000)
}
\ Sorting:sort<?>(values)
}
}
Results for 100 samples each.
Improvement computed with 1 - (new median) / (baseline median), i.e., reduction in % of the baseline time.
Change | Mean (s) | Median (s) | Std. Dev. (s) | Improvement |
---|---|---|---|---|
0.17.0.0 with tuple-cache limits set to 0 | 2.971 | 2.970 | 0.022 | (baseline) |
0.17.0.0 | 2.671 | 2.670 | 0.021 | 10.1% |
0.17.0.0 + #177 | 2.041 | 2.030 | 0.026 | 31.6% |
0.17.0.0 + #177 + atomic_flag |
1.647 | 1.650 | 0.010 | 44.4% |
Since tuple-caching (this issue) and value-boxing are (mostly) independent, I think we can attribute a 21.5% (31.6%-10.1%) improvement to value-boxing* and a 22.9% (44.4%-21.5%) improvement to tuple-caching.
(I'm disregarding the interaction of both competing for the allocator in the baseline case.)
This will drastically cut down on dynamic allocation, since these are used for every function call.
constexpr
for those.thread_local
. (AFAIK, construction and destruction are always done in the same thread.)