jart / blink

tiniest x86-64-linux emulator
ISC License
6.96k stars 222 forks source link

Let fast ALU micro-ops for JIT clobber entire low byte of eflags #135

Closed tkchia closed 1 year ago

tkchia commented 1 year ago

If an ALU operation is supposed to set the SF, ZF, AF, & CF flags, but we know we only need (at most) ZF & CF, then allow the JIT code to clobber the whole lower flags byte. On the x86-64 host platform, gcc knows how to transform

m->flags = (m->flags & ~SZAC) | (u32)!z << FLAGS_ZF | (u32)c << FLAGS_CF;

to a byte store operation in the least signfiicant byte of m->flags.

tkchia commented 1 year ago

Apparently this optimization does not quite work. I probably need to look further into it. Thank you!

ghaerr commented 1 year ago

Hello @tkchia,

I see the CI test run died in popcnt. Unless you can repeat this all the time, this may not be an issue, as I have had several PRs that failed in that same test, that had nothing to do with pop count (they were TUI modifications). FYI, please recheck by running the test manually a few times rather than believing the CI result!

Thank you!

tkchia commented 1 year ago

Hello @ghaerr,

I have had several PRs that failed in that same test, that had nothing to do with pop count (they were TUI modifications)

The failure for this PR does look pretty consistent — it failed 3 times (once on my local PC). Anyway, the potential improvement is not exactly very huge anyway. There should be a better way to deal with the flags register (without turning the Blink code into a maintenance nightmare).