Speed up SYS_SetMemory_v2_54 by 300%

kervinck commented 4 years ago

Without breaking the interface, we can go to 8 bytes per dispatch (instead of 4), and 3 dispatches per 148 cycle time slice (instead of 2). This gives a total speedup of 300%

kervinck commented 4 years ago

Draft and untested proposal:

#-----------------------------------------------------------------------
# Extension SYS_SetMemory_v2_54
#-----------------------------------------------------------------------

# SYS function for setting 1..256 bytes
#
# sysArgs[0]   Copy count (destructive)
# sysArgs[1]   Copy value
# sysArgs[2:3] Destination address (destructive)
#
# Sets up to 8 bytes per invocation before restarting itself through vCPU.
# Doesn't wrap around page boundary. Can run 3 times per 148-cycle time slice.
# Combined that gives a 3 times speedup over ROMv4 and before.

label('SYS_SetMemory_v2_54')
ld([sysArgs+0])                 #15
bra('sys_SetMemory#18')         #16
ld([sysArgs+2],X)               #17

# SYS_SetMemory_54 implementation
label('sys_SetMemory#18')
ld([sysArgs+3],Y)               #18
ble('.sysSm#21')                #19
suba(8)                         #20
bge('.sysSm#23')                #21
st([sysArgs+0])                 #22
anda(4)                         #23 4 pixels
bne('.sysSm#26')                #24
ld([sysArgs+1])                 #25
st([Y,Xpp])                     #26
st([Y,Xpp])                     #27
st([Y,Xpp])                     #28
bra('.sysSm#31')                #29
st([Y,Xpp])                     #30
label('.sysSm#26')
wait(31-26)                     #26
label('.sysSm#31')
ld([sysArgs+0])                 #31
anda(2)                         #32 2 pixels
bne('.sysSm#35')                #33
ld([sysArgs+1])                 #34
st([Y,Xpp])                     #35
bra('.sysSm#38')                #36
st([Y,Xpp])                     #37
label('.sysSm#35')
wait(38-35)                     #35
label('.sysSm#38')
ld([sysArgs+0])                 #38 1 pixel
anda(1)                         #39
bne(pc()+3)                     #40
bra(pc()+3)                     #41
ld([sysArgs+1])                 #42
ld([Y,X])                       #42(!)
st([Y,X])                       #43
ld(hi('NEXTY'),Y)               #44
jmp(Y,'NEXTY')                  #45
ld(-48/2)                       #46

label('.sysSm#21')
suba(8)                         #21
st([sysArgs+0])                 #22
label('.sysSm#23')
ld([sysArgs+1])                 #23 8 pixels
st([Y,Xpp])                     #24
st([Y,Xpp])                     #25
st([Y,Xpp])                     #26
st([Y,Xpp])                     #27
st([Y,Xpp])                     #28
st([Y,Xpp])                     #29
st([Y,Xpp])                     #30
st([Y,Xpp])                     #31
ld(8)                           #32
adda([sysArgs+2])               #33
st([sysArgs+2])                 #34
ld([sysArgs+0])                 #35
bne(pc()+3)                     #36
bra(pc()+3)                     #37
ld(-2)                          #38
ld(0)                           #38(!)
adda([vPC])                     #39
st([vPC])                       #40
ld(hi('REENTER'),Y)             #41
jmp(Y,'REENTER')                #42
ld(-46/2)                       #43

kervinck commented 4 years ago

Details in the above are off. But the concept works and is now integrated. https://github.com/kervinck/gigatron-rom/commit/3fd015886ecaa6adc91c9195ccf639dc48a7fb67

kervinck / gigatron-rom

Speed up SYS_SetMemory_v2_54 by 300% #126