Open julian-klode opened 8 years ago
Hi! I think, this is the kind of optimisation re2c should definitely have (probably in a more general form).
In fact, re2c already has a similar optimisation (enabled with -b
option) that tries to reason about bit patterns of the matched characters. It is not powerful enough to handle this case, though.
I experimented a little bit with your example (I expected gcc -O2
to optimize both cases equally well):
int f(char c)
{
switch (c) {
case 'A'...'Z':
case 'a'...'z':
return 1;
default:
return 0;
}
}
versus
int f(char c)
{
switch (c | 0x20) {
case 'a'...'z':
return 1;
default:
return 0;
}
}
However, in the first case gcc -O2 -c 1.c -o 1.o && objdump -d 1.o
:
0000000000000000 <f>:
0: 8d 4f bf lea -0x41(%rdi),%ecx
3: 31 c0 xor %eax,%eax
5: 80 f9 39 cmp $0x39,%cl
8: 77 1b ja 25 <f+0x25>
a: b8 01 00 00 00 mov $0x1,%eax
f: 48 ba ff ff ff 03 ff movabs $0x3ffffff03ffffff,%rdx
16: ff ff 03
19: 48 d3 e0 shl %cl,%rax
1c: 48 85 d0 test %rdx,%rax
1f: 0f 95 c0 setne %al
22: 0f b6 c0 movzbl %al,%eax
25: f3 c3 repz retq
while in the second case gcc -O2 -c 2.c -o 2.o && objdump -d 2.o
:
0000000000000000 <f>:
0: 83 cf 20 or $0x20,%edi
3: 31 c0 xor %eax,%eax
5: 83 ef 61 sub $0x61,%edi
8: 40 80 ff 19 cmp $0x19,%dil
c: 0f 96 c0 setbe %al
f: c3 retq
Clang does somewhat better in the first case, but still not as good as in the second case.
So it's up to re2c after all.
Technically this optimization should be not hard to implement. It is low-level and doesn't interfere with complex algorithms involved in DFA construction, etc. It can be implemented in a separate pass during code generation (re2c already has the necessary infrastructure).
What we need is a good heuristic. One should probably start simple: handle one special case of case-insensitive all-alphabetical characters. But I think should be a lot research on the topic.
The results can be verified using --skeleton
.
Clang does somewhat better in the first case
After a closer look this is not true: https://llvm.org/bugs/show_bug.cgi?id=30531
Currently re2c generates switches like
I found over the weekend - while writing https://github.com/julian-klode/triehash - that this can be substantially optimised: If all cases are alphabetical characters, we can transform it into:
(which one is chosen does not matter). In fact, that's not only true for alphabetical characters, but for 128 characters in a 256 ANSI like character set. The check I use for that in Perl is:
This can provide a good performance boost, as it now translates to:
rather than
In my case, with triehash, this resulted in my benchmark performance improving from 714 ns to 495 ns, compared to 475 ns for the case-sensitive operation (re2c benchmarks similarly for the unoptimised case, so I think the result will be similar).
I'm not sure how feasible this is to implement for re2c, but it would be nice to have.