wiremod / wire-cpu

Legacy CPU/GPU/SPU as a separate addon.
Apache License 2.0
8 stars 6 forks source link

Memory reads at high freq causing server lag #29

Closed DerelictDrone closed 8 months ago

DerelictDrone commented 9 months ago

Should look into a way to lessen this, or limit the default max freq to a safer number than 2.1MHz.

Right now, I've tested a few different devices for the speed that they can reach before I start to lag, using the code

//MOV R0,1024 // for internal ram test
CPUGET R0,43 //for external ram test
x:
MOV ESI,R0
MOV EDI,R0
INC R1
MCOPY 8192 // MCOPY is limited to 8192 bytes, so use multiple to copy more than 8192
// the "Bytes" measurement was obtained by summing the operand of all of the MCOPY instructions in the CPU code
// for the test, when it was determined to reduce my FPS to between 60 and 90 FPS while the CPU freq was set to 2100000
JMP x

Address bus: 5120 bytes 128kb ram gate: 12288 bytes CPU's own internal ram 122880 bytes

Lag being defined as dropping my FPS by more than 30 in singleplayer, these numbers are me trying to get as close to 60fps from 120fps

Initial suggestion was to check how many external accesses we've had this tick and raise the "external mem access" cycle penalty by a very small amount each time, to put a "soft cap" on the number of external accesses that can be made per VM , but since it turns out the internal ram can also run into this issue, this may need another solution.

See here for how the readcell function applies the "memory access" penalty to TMR, and here for how the CPU handles the TMR variable during execution.

thegrb93 commented 9 months ago

Do instructions have a cycle cost? I.e. different instructions use more or less cpu cycles? Maybe that should be introduced.

DerelictDrone commented 9 months ago

The BIT instructions seem to be the only ones that inherently have one, and it handles those by adding to the timer directly.

Precompiling

Precompile call: +192000(defined as 24*8000) clock cycles Dyn_EmitBreak: +self.PrecompileInstruction(num of instructions that were compiled during this precompile call, the instruction calls this function to end precompile early, for branches and the like)

Readcell/Writecell

Any memory access: +5 Trapped memory access: +10+(+5) (the paging system has to stop current execution and call INT 28 or INT 29) External memory access: +15(+5)

Instructions

SBIT and CBIT: +20 TBIT, BAND, BOR, BXOR, BSHL, BSHR: +30

DerelictDrone commented 9 months ago

Actually, this may partially be that those big memory instructions(like MCOPY) internally use a for loop, and I don't think we can stop execution mid instruction, since these are LONG instructions, that may have a hand in it, putting an entire page(128 bytes) of MOV instructions couldn't trigger the FPS drop for me(on a single CPU anyway) but it could be that despite how many cycles get used up, it can't stop until the MCOPY is complete, maybe it would be a good idea to also make these able to be interrupted by the timer and the remaining indexes get wrapped up next tick(or until timer) until the MCOPY/MSHIFT is complete.

DerelictDrone commented 9 months ago

Seems like E2 actually faces a similar issue, I think it might be more akin to just... the device's own readcell/writecell being too slow.

4 unlimited ops E2's at 346830 BPS on an address bus drops my FPS as much as gmod_dzRPAdeDsK

1 CPU at max freq copying 5120 bytes per run at 337920BPS on an address bus. gmod_H4FjfoWjU7