Closed DerelictDrone closed 8 months ago
Do instructions have a cycle cost? I.e. different instructions use more or less cpu cycles? Maybe that should be introduced.
The BIT instructions seem to be the only ones that inherently have one, and it handles those by adding to the timer directly.
Precompile call: +192000(defined as 24*8000) clock cycles Dyn_EmitBreak: +self.PrecompileInstruction(num of instructions that were compiled during this precompile call, the instruction calls this function to end precompile early, for branches and the like)
Any memory access: +5 Trapped memory access: +10+(+5) (the paging system has to stop current execution and call INT 28 or INT 29) External memory access: +15(+5)
SBIT and CBIT: +20 TBIT, BAND, BOR, BXOR, BSHL, BSHR: +30
Actually, this may partially be that those big memory instructions(like MCOPY) internally use a for loop, and I don't think we can stop execution mid instruction, since these are LONG instructions, that may have a hand in it, putting an entire page(128 bytes) of MOV instructions couldn't trigger the FPS drop for me(on a single CPU anyway) but it could be that despite how many cycles get used up, it can't stop until the MCOPY is complete, maybe it would be a good idea to also make these able to be interrupted by the timer and the remaining indexes get wrapped up next tick(or until timer) until the MCOPY/MSHIFT is complete.
Seems like E2 actually faces a similar issue, I think it might be more akin to just... the device's own readcell/writecell being too slow.
4 unlimited ops E2's at 346830 BPS on an address bus drops my FPS as much as
1 CPU at max freq copying 5120 bytes per run at 337920BPS on an address bus.
Should look into a way to lessen this, or limit the default max freq to a safer number than 2.1MHz.
Right now, I've tested a few different devices for the speed that they can reach before I start to lag, using the code
Address bus: 5120 bytes 128kb ram gate: 12288 bytes CPU's own internal ram 122880 bytes
Lag being defined as dropping my FPS by more than 30 in singleplayer, these numbers are me trying to get as close to 60fps from 120fps
Initial suggestion was to check how many external accesses we've had this tick and raise the "external mem access" cycle penalty by a very small amount each time, to put a "soft cap" on the number of external accesses that can be made per VM , but since it turns out the internal ram can also run into this issue, this may need another solution.
See here for how the readcell function applies the "memory access" penalty to TMR, and here for how the CPU handles the TMR variable during execution.