kervinck / gigatron-rom

System, apps and tooling for the Gigatron TTL microcomputer
BSD 2-Clause "Simplified" License
229 stars 81 forks source link

lcc: optimisations #70

Open kervinck opened 5 years ago

kervinck commented 5 years ago

[Note: This issue is an aggregation for optimisations of the emitted code. I want to park all ideas here for future reference. Regarding priorities in LCC the order should be 1. Correctness, 2. Usability, 3. Optimisations.]

Ideas (some simpler than others. some realistic, some are nonsense):

  1. POKE is often preceded by ANDI 0xff, but this is almost never needed
  2. entermask/leavemask can sometimes use LDI
  3. many cases of stw(x) + ldw(x) or other way around. Can be optimised
  4. eliminate ldw(vAC) and stw(vAC)
  5. if we have a known value in vAC, use SUBI to get a small negative number
  6. comparisons eq/ne with small negative constants can avoid LDWI
  7. don't 'pusha' each argument, but allocate the argument area in one go
  8. option to use vCPU stack as data stack (lots of work, unclear if it will bring anything)
  9. use DEF for string pointer initialisers
  10. is it feasible to use INC more often?
kervinck commented 5 years ago

Low priority

kervinck commented 5 years ago

Two ideas to reduce the number of thunk functions in rt.py:

  1. Move the start of the pixel lines to offset 0x60:

videoTable[1] = 0x60;

No need to let code do this: do this as a 1-byte segment somewhere in the .gt1 file (near the end?). Pixels then run from 0x60 to 0xff while code can live at offset 0 instead of 0xa0. This eliminates the need for thunk1. Adjust PutChar, Newline and ClearScreen accordingly. Actually, these become slightly simpler because testing for end of line is now simpler.

  1. thunk2 hops over from the end of page 4 into page 8. We can do that by placing thunk2 at the beginning of page 5 (C stack area), and jump in there with a CALL thunk0 from page 4.
0500 2b tt     STW  tt
0502 11 00 08  LDW  $0800
0505 2b 1a     STW  vLR
0507 21 tt     LDW  tt
0509 ff        RET

In essence, the above eliminates four zero page bytes and one helper function.

kervinck commented 5 years ago
  1. Use XORW instead of SUBW before the == and != operators.

(Sometimes the compiler even juggles the order of SUBW operands...)

kervinck commented 5 years ago
  1. MULI2(CNSTI2(1), ...

Simplify. Same for CNSTI2(0) and MULU2, DIVXX etc

kervinck commented 5 years ago
  1. Remove 'rv' from rt.py, use 'ha' instead for return values
kervinck commented 5 years ago
  1. More aggressive purging of unused library functions. For example:
int main(void)
{
  return 0;
}

Still gives an .gt1 file of more than 4 KB in size. It seems that references from other functions are not purged (e.g. div in rt.py references divu, and divu is never purged because of this?) Also some references come from the data space, such as the flush methods in the FILE objects of stdin.c/stdout.c).

kervinck commented 5 years ago
  1. LCC inserts explicit return values in places where "don't care" will work. This makes some code larger than needed.

See this comment for an example: https://github.com/kervinck/gigatron-rom/issues/76#issuecomment-497897449

kervinck commented 4 years ago
  1. Some C11 code crept into src/gt1.md. Workaround solved with #97. It's would be nicer to make it all ANSI C compliant (aka C89).
lb3361 commented 1 year ago

This issue should be closed since glcc already does such optimizations (when they make sense in the new code)