jart / blink

tiniest x86-64-linux emulator
ISC License
6.96k stars 222 forks source link

AArch64 host fixes and optimizations #145

Closed tkchia closed 1 year ago

tkchia commented 1 year ago
tkchia commented 1 year ago

The JITter patches are still work in progress, but are already starting to show some real (even if small) improvements in running time. :slightly_smiling_face:

ghaerr commented 1 year ago

Hello @tkchia,

The JITter changes for Aarch64 are interesting. I am trying to learn more about the ARM v8+ ISA myself; may I ask what you're using for reference materials for the processor instruction decoding and/or whether you've found any particularly nice descriptions of ARM v8? I've found a number of books, but many are older and oriented around 32-bit pre-v8 ARM.

Thank you!

tkchia commented 1 year ago

Hello @ghaerr,

I do not know of any particularly "friendly" references on ARMv8, I am afraid. I am also looking for something like ref.x86asm.net for ARM instruction formats, but no luck so far.

The official ARM Architecture Reference Manual (DDI 0487) is available from arm.com, so for now I am using that. The Procedure Call Standard for the ARM 64-bit Architecture (IHI 0055), which describes the official AAPCS64 ABI, also used to be available, though it seems to be gone (paywalled?) now. There is still a Programmer's Guide which summarizes the ABI though. Also I recall that Apple macOS and iOS actually use a slightly different ABI.

Thank you!

tkchia commented 1 year ago

On my AArch64 box, https://github.com/jart/blink/pull/145/commits/b3afd4863c0b048ee6c9246412ba5181a3a0236f improves the running time of o//blink/blink third_party/cosmo/2/test_suite_ecp.com by about 5%:

PASSED (130 / 130 tests (73 skipped))
RL: took 14,428,269µs wall time
RL: ballooned to 5,220kb in size
RL: needed 14,145,480µs cpu (0% kernel)
RL: caused 1,256 page faults (99% memcpy)
RL: 172 context switches (11% consensual)
RL: performed 1,016 reads and 8 write i/o operations

versus

PASSED (130 / 130 tests (73 skipped))
RL: took 15,051,129µs wall time
RL: ballooned to 5,200kb in size
RL: needed 14,786,259µs cpu (0% kernel)
RL: caused 1,257 page faults (100% memcpy)
RL: 251 context switches (2% consensual)
RL: performed 0 reads and 8 write i/o operations

By the way o//blink/blink third_party/cosmo/2/test_suite_mpi.com took about 10 minutes to run on AArch64 — on x86-64 it took about 18 seconds. I guess my AArch64 box is a bit under-powered. :neutral_face:

Thank you!

tkchia commented 1 year ago

Hello @ghaerr, hello @jart,

Incidentally, I still find it a bit ... silly that test_suite_mpi.com needs about 10 minutes to run on my AArch64 box. I suspect though that the JITter will need some significant rearchitecting, if there are to be any major speed improvements.

Thank you!

ghaerr commented 1 year ago

Hello @tkchia,

Are you thinking that perhaps test_suite_mpi.com should be commented out for the time being? That test also failed during the CI run, unrelated to my last PR here, although I never figured out why.

Thank you!

tkchia commented 1 year ago

@ghaerr : the test is OK, I think. It is just that it takes a long time to run on my box. Thank you!