Closed ampli closed 9 years ago
- I would like to fix the indentation problem in this function (instead of 4 spaces, the first indentation is 7 spaces and the second one is 8 spaces), as the first commit.
Yes, please (as a separate commit).
2.
Can't say much right away, will need to look into it ;-).
If end is omitted, there is less one variable on the stack, and also the code becomes smaller on both SPARC (my test case for RISC) and x86_64. So should it be eliminated? (In a separate commit of course).
Can't say much right away, will need to look into it ;-).
Currently we have in cleanmarks():
cleanmarks(ByteProg *prog)
{
char *pc = prog->insts;
char *end = pc + prog->bytelen;
while (pc < end) {
The following causes the code space saving mentioned above:
cleanmarks(ByteProg *prog)
{
char *pc = prog->insts;
while (pc < pc + prog->bytelen) {
Of course this particular saving is relatively small. In larger functions such accumulating savings can be relatively big. Hence this is also a general question. (BTW, In any case I don't propose or intend to do optimizations that reduce readability.)
Currently we have in cleanmarks():
I see, thanks for elaboration. Are you sure that performing that source change made any difference for gcc with at least -O2 setting? That's trivial case of expression propagation - if end is used once, then code produces with this rather mainline optimization should be equivalent of source-level change you show.
Let me know if something is not answered yet which blocks (or backlogs) your work on fix(es) mentioned above, sorry for some delay with processing these.
If you mean set O2 in addition to -Os, then please note that gcc's manual says:
-Os enables all -O2 optimizations that do not typically increase code size.
I validated that indeed adding -O2 to -Os doesn't change the code size.
When using only -O2 (with the yet unfixed cleanmarks() code), on x86_64 the difference is negligible:
2960 bytes before this manual optimization, and 2958 bytes with it. It is indeed not expected that there will be a saving. I didn't look in the assembly code for the reason.
However, when compiling with only -O2 for sparc64, this manual optimization increases the code by 8 bytes...
But before that you compared code using -Os. But if you use -O2 without -Os it looks to me totally other considerations are added.
So the question remained: Should I code for speed or for space (both stack and code), supposing the readability is comparable? And an added question: should I assume also using -O2 alone? (As we saw, it is hard to consider code space if both ways can be equally used.)
If you mean set O2 in addition to -Os
Well, I meant "good enough level of optimizations", which is usually -O2, though as you point out, most of those optimizations would be enabled for -Os too.
So the question remained: Should I code for speed or for space (both stack and code), supposing the readability is comparable?
Please write new code, optimizing for space, that would be much appreciated. But otherwise, please follow golden rule of making changes: change as little as possible while working on particular task. Then means that unless the task is "optimize all the code for size", then touching existing code for a minor changes like above isn't much useful, and means side-tracking into deep trenches of analysing stability of such changes' results (across archs, compiler versions and settings), etc.
Btw, if you'd really like to test changes' effect on a RISC arch, please use ARM - that's pretty much a leader of embedded archs, so making changes which affect it negatively would be strange ;-). (Even though ARM itself is not pure RISC arch.)
Hope that clarifies the matter, and sorry for this excursion into embedded code optimization, hope that didn't sidetrack you too much, and found it interesting in some aspects. Thanks!
OK. I am now using an ARM cross-compilation to check the code size (this way I checked the cleanmarks() code).
Please note the last two lines: they indicate a bug. Actually, not only the result of the thompson/pike functions is incorrect here, but there is a silent out-of-bound access, as can be seen when compiled with ASan:
The problem is in cleanmarks():
case
's for classes (Class, ClassNot, NamedClass) are missing. As a result, the mark bit is cleared from command offsets too! This clobbers negative offsets of Jmp commands, and converts them to a relatively large positive number, that later causes an out-of-bound memory access.I would like to send a pull request that fixes that, but I have 2 questions (your answers can guide me in further fixes that I would like to send):
end
is omitted, there is less one variable on the stack, and also the code becomes smaller on both SPARC (my test case for RISC) and x86_64. So should it be eliminated? (In a separate commit of course).