tsoding / porth

It's like Forth but in Python
627 stars 50 forks source link

Produce less jump points to enable optimization #21

Closed ap29600 closed 2 years ago

ap29600 commented 2 years ago

It would be nice to only produce jump points in the assembly when that jump point is actually used.

why?

This is useful because it makes it clear where an instance like:

addr_79:
    ;; -- plus --
    pop rax
    pop rbx
    add rax, rbx
    push rax
addr_80:
    ;; -- load --
    pop rax
    xor rbx, rbx
    mov bl, [rax]
    push rbx

can be safely reduced to:

addr_79:
    ;; -- plus --
    pop rax
    pop rbx
    add rax, rbx
    ;; push rax
addr_80:
    ;; -- load --
    ;; pop rax
    xor rbx, rbx
    mov bl, [rax]
    push rbx

while this one can't:

addr_11:
    ;; -- push int 0 --
    mov rax, 0
    push rax
addr_12:
    ;; -- while --
addr_13:
    ;; -- dup -- 
    pop rax
    push rax
    push rax

because a jump might occur to addr_12.

Doing this would enable to make optimizations with a separate tool instead of making the compiler more complex.

is this worth it?

A quick and dirty test removing some of the redundant pushes and pops manually from the generated assembly takes the runtime of rule110 on a board of size 1000 from 0.069s to 0.065s (consistently over a batch of 10 runs), and I definitely missed at least half the pops.

So, probably not worth much but it's still something.

EDIT I removed a few more and now it's 0.069s to 0.046s, by also replacing instances of:

    push rax
    pop rbx

by:

mov rbx, rax
rexim commented 2 years ago

@ap29600 optimization is currently outside of the scope of the project. Once the language is mature enough it's gonna be a whole series of streams.

ap29600 commented 2 years ago

thanks, I'm looking forward to it.

Just as a curiosity, do you want to do this after the compiler is rewritten in porth, or just once the language is more usable?