jserv / amacc

Small C Compiler generating ELF executable Arm architecture, supporting JIT execution
Other
1.01k stars 161 forks source link

Fix padding, codegen size issue and refactor code in ELF #29

Closed lecopzer closed 6 years ago

lecopzer commented 6 years ago

Fix padding between each section

In the past, we give all the program header a fix size -- 4K, and always have padding the end of each header in ALIGN (4K). In a small program, the 4K padding waste a lot of space, so this commit optimizes the padding zero.

  1. Remove ALIGN, there is only few padding between .text and .data section which is actually needed.

  2. Add PAGE_SIZE to align offset and v_addr. Elf loader uses page size to do mmap and also checks the align between offset and v_addr.

  3. There are two way to align offset and v_addr: a) According to offset, change v_addr b) According to v_addr, change offset

    In a), the data addr was used in codegen, and elf generate after codegen, so it's hard to accomplish. So choose b) by adding load_bias to align them.

    Notice that this may make some zero padding between .text and .data.

  4. .rwdata is meaningless, it belongs to .data section, so merge it into .data makes code clean.

  5. Remove some tricky code such as gap and code_size_align.

  6. The elf size shrink about 59% to 7.9% now from

-rwxrwxr-x 1 lecopzer lecopzer 88K 4月 6 22:17 amacc -rwxrwxr-x 1 lecopzer lecopzer 13K 4月 6 22:17 arginc -rwxrwxr-x 1 lecopzer lecopzer 13K 4月 6 22:17 char -rwxrwxr-x 1 lecopzer lecopzer 13K 4月 6 22:17 cond -rwxrwxr-x 1 lecopzer lecopzer 14K 4月 6 22:17 eq -rwxrwxr-x 1 lecopzer lecopzer 13K 4月 6 22:17 fib -rwxrwxr-x 1 lecopzer lecopzer 13K 4月 6 22:17 for -rwxrwxr-x 1 lecopzer lecopzer 13K 4月 6 22:17 hello -rwxrwxr-x 1 lecopzer lecopzer 13K 4月 6 22:17 inc -rwxrwxr-x 1 lecopzer lecopzer 13K 4月 6 22:17 jit ...

to

-rwxrwxr-x 1 lecopzer lecopzer 81K 4月 7 02:28 amacc -rwxrwxr-x 1 lecopzer lecopzer 5.2K 4月 7 02:28 arginc -rwxrwxr-x 1 lecopzer lecopzer 5.4K 4月 7 02:28 char -rwxrwxr-x 1 lecopzer lecopzer 5.3K 4月 7 02:28 cond -rwxrwxr-x 1 lecopzer lecopzer 6.0K 4月 7 02:28 eq -rwxrwxr-x 1 lecopzer lecopzer 5.3K 4月 7 02:28 fib -rwxrwxr-x 1 lecopzer lecopzer 5.3K 4月 7 02:28 for -rwxrwxr-x 1 lecopzer lecopzer 5.3K 4月 7 02:28 hello -rwxrwxr-x 1 lecopzer lecopzer 5.2K 4月 7 02:28 inc -rwxrwxr-x 1 lecopzer lecopzer 5.3K 4月 7 02:28 jit ...

Make code readable in elf32

  1. Fix hardcode by enum.
  2. Export strlen() symbol to replace loop in append_strtab().
  3. Make consistent of some naming rules such as gen_SH() to gen_shdr().

Fix wrong code_size in first codegen

By simulating .plt function entry really happen, plt_func_addr[i] and plt_func_addr[i-1]has an offset of 16 (4 instruction * 4 bytes), after adding offset of 16 while plt_func_addrinitialization, the first codegen and second codegen have consistent code_size now.

lecopzer commented 6 years ago

I revise some comment. Please review it, thanks!

lecopzer commented 6 years ago

Now I fix a lot of hardcode. Hope for any advice further, thanks.

lecopzer commented 6 years ago

Update comment 1. because there is still some padding zero (much fewer) between .data and .text.

lecopzer commented 6 years ago

Add another commit to fix wrong code_size in first codegen

lecopzer commented 6 years ago

Now the PR have many different types of commit, I'm not sure the title is readable, understandable and precise but brief enough.