Closed SylvanBrocard closed 1 month ago
Forgot to specify: the benchmarks are run with an input value of INT32_MAX (2147483647).
I haven't tested it, but I'm surprised by the difference on a 64 bit platform. I've added a cirange
type that uses int instead of ptrdiff_t in v50dev branch (for now, will do some test myself). The regular crange need to work with intervals outside +-2³¹ on 64-bit platforms.
Feature
I'd like for
crange
andc_forrange
to take custom types like the STC containers.Rationale
There are too many lost optimization opportunities by having everything be intptr_t (a.k.a. long long).
Benchmarks
This is my litmus test for functional-style chaining: summing the square of even integers in a range.
STC code
Here is the STC code:
ISO C code
And here is the equivalent C code:
A simple look at the generated assembly will show you that the STC code is not vectorized, while the ISO C code is (tried with GCC 13.2).
Benchmark results (without the feature)
Here are the benchmarks:
You can cast the values by doing
c_flt_map((int)*value * (int)*value)
, and it does vectorize, but it's still significantly slower than the ISO C, and probably too tricky of an an optimization.Change
However, by applying this change:
We get much better code generation.
Benchmark with the modification