Closed ampli closed 9 years ago
By further inspecting the _compilecode()'s source code, I found there is a place for many more code space optimizations, that also reduce the amount of source code. After these optimizations, the result is a smaller code than the current repository one (the separate computation of regexp code size).
As @dpgeorge said, these savings can be used in the original code too. However, as @pfalcon noted, a significant cause of the increase code size is the EMIT macro, which repeats many times and which generates more code in my proposed unification (because it has an added conditional).
So a main saving of code can be done by reducing the number of invocations of these macro. This saves more code in my proposed solution (because the macro generate there a larger code). I found several places in which it can be done, as follows:
case '*':
...
if (re[1] == '?') {
EMIT(term, RSplit);
re++;
} else {
EMIT(term, Split);
}
This can be written as:
EMIT(term, re[1] == '?' ? (re++, RSplit): Split);
This saved 21 bytes.
Similarly, the following:
case '[': {
...
if (*re == '^') {
EMIT(pc++, ClassNot);
re++;
} else {
EMIT(pc++, Class);
}
can be written as:
EMIT(PC++, re[1] == '^' ? (re++, ClassNot): Class);
The following:
case '+':
...
if (re[1] == '?') {
EMIT(pc, Split);
re++;
} else {
EMIT(pc, RSplit);
}
can be written as:
EMIT(PC, re[1] == '?' ? (re++, Split): RSplit);
(BTW, this makes it cheaper now to also implement a non-greedy '?' .)
At this point it may look as if there are no more opportunities to save EMIT's, since there are no more if
conditions with EMIT in their "then" and "else".
However, case
is also a kind of condition:
case '^':
EMIT(pc++, Bol);
prog->len++;
break;
case '$':
EMIT(pc++, Eol);
prog->len++;
break;
This can be reduced to:
case '^':
case '$':
EMIT(PC++, *re == '^' ? Bol : Eol);
prog->len++;
break;
saving further 7 bytes.
As we can see below, most of the prog->len++;
statements can be eliminated too.
Other code space savings (actually implemented and checked for code saving and by run-tests):
default:
, '\\':
and '.':
cases. This saves their common statements.prog->len++;
to the top of the loop, as follows, and removing it from the rest of the code
(changing prog->len += 2;
to prog->len++;
): for (; *re && *re != ')'; re++) {
prog->len++; // <-----
switch (*re) {
if
's need to include prog->len++;
There is an additional opportunity to save an EMIT (in the '.' code) but I hesitate to do that because it doesn't look nice.
BTW, when I added the (?:) support, I didn't include it in the sizecode() computation. This is not fatal, as it just causes 4-byte wasted space per (?:) for each regex. Fixing it in the separately-calculated sizecode() (as in the current code) will increase its size slightly.
So I guess it is better to ignore this pull request again, and I will send a new one, with all the above mentioned code optimizations.
Nice work!
So I guess it is better to ignore this pull request again, and I will send a new one, with all the above mentioned code optimizations.
No, I think @pfalcon would merge this as-is and then you can do a separate commit for space optimisations.
@ampli , Impressive analysis, but as I mentioned, we don't want to burden you with byte-saving stuff, just raise awareness of the issue and see if it's possible with reasonable effort to minimize size increase when doing refactors/adding new features.
There's also well-knows saying that premature optimization is a root of many evils.
For example, I'm not sure I find this code too readable:
EMIT(PC, re[1] == '?' ? (re++, Split): RSplit);
Perhaps, perhaps it would make sense to do such optimization when it's clear that re1.5 supports as much of regex syntax as it could, but I guess it's counter-productive to do it before you're going to add support for more constructs.
So, as @dpgeorge says, let me merge this as is, and further tweaks can be submitted separately, but don't let that sway you from your feature work ;-).
+#define PC (prog->bytelen)
- int pc = prog->bytelen;
Also, this is questionable optimization. It apparently helps x86, but will likely hurt any RISC processor, them having registers in abundance, but not being able to do INC [MEM] operation as one instruction. I'll do separate benchmarking later, and we may need to #ifdef it. But then there will be 2 code configs to test, which again brings us to issue of premature optimization ;-).
Merged, thanks.
+#define PC (prog->bytelen)
- int pc = prog->bytelen;
I tried a SPARC cross-compilation, and it indeed increases code size:
$ make "CC=sparc64-linux-gnu-gcc" "CFLAGS=-Os -I/PATHTO/usr/include/"
$ file compilecode.o
compilecode.o: ELF 64-bit MSB relocatable, SPARC V9, relaxed memory ordering, version 1 (SYSV), not stripped
Before:
text data bss dec hex filename
1892 0 0 1892 764 compilecode.o
After:
text data bss dec hex filename
1952 0 0 1952 7a0 compilecode.o
On the other hand, on X86_64 the effect is the opposite one: Before:
text data bss dec hex filename
1545 0 0 1545 609 compilecode.o
After:
text data bss dec hex filename
1478 0 0 1478 5c6 compilecode.o
So a define (#ifdef RISC?) may be indeed needed.
It is complicated doing size optimisations to target multiple archs. With MicroPython we care a lot about Thumb2 arch and that's what's generally used as a baseline for space optimisations.
I have just added support (in my development code) for non-greedy '?' . This of course adds more code space (~40 for ARM and x86_64).
However, as I preciously indicated, when I learn the code I am able to propose optimizations.
For example, I'm not sure I find this code too readable:
EMIT(PC, re[1] == '?' ? (re++, Split): RSplit);
Maybe the following conditionally-EMIT macro can improve its readability:
// Emit a byte according to cond; Also increment re if cond is true.
#define EMIT_COND(cond, at, cond_true_byte, cond_false_byte) \
(code ? ((code[at] = cond ? (re++, cond_true_byte) : cond_false_byte)) : (void)(at))
Then in the code:
EMIT_COND(re[1] == '?', term, RSplit, Split);
Using this EMIT_COND for greedy implementation (3 invocations) saves: 68 bytes / ARM 55 bytes / x86_64
However, the following:
EMIT_COND(re[1] == '^', PC++, ClassNot, Class);
...
EMIT_COND(re[0] == '^', PC++, Bol, Eol);
saves code space only on ARM, and adds on x86_64.
To sum up: I will send a pull request for the non-greedy '?' addition, not using EMIT_COND.
If you like,I can follow it with a pull request for code saving using EMIT_COND and maybe another one to revert the usage of the PC macro which save 29 bytes on x86_64 but adds 200 bytes on ARM!
If you like,I can follow it with a pull request for code saving using EMIT_COND and maybe another one to revert the usage of the PC macro which save 29 bytes on x86_64 but adds 200 bytes on ARM!
@ampli you could possibly auto-detect the machine architecture and select the macro implementation which is know to reduce code size the most for that arch. But perhaps this is over-engineering, and might depend a lot on the compiler which can change its optimisations in a future version.
Also change re1_5_sizecode() to return -1 on error. Note: Previously it couldn't return error indication, so existing programs must now check its returned value for error.
Edit: It actually could return -1. This is not checked in MicroPython and I have just opened an issue there (micropython/micropython#1369).
Also optimize _compilecode() for size (in this particular case it also saves cycles).
The total change still adds about 45 bytes (x86_64).
Benefits: