ocxtal / insn_bench_aarch64

Instruction latency & throughput profiler for AArch64
31 stars 8 forks source link

segmentation fault on Apple M1 Ultra #3

Open jdomke opened 1 year ago

jdomke commented 1 year ago
g++ --version
Apple clang version 14.0.3 (clang-1403.0.22.14.1)
Target: arm64-apple-darwin22.3.0
# AArch64 latency / throughput benchmark report
# Generated by https://github.com/ocxtal/insn_bench_aarch64 (commit: unknown).
# CPU frequency estimation
# measuring CPU frequency, assuming latency of 64bit addition is 1 cycle(s):
# - 3224.58 MHz
# - 3224.62 MHz
# - 3224.76 MHz
# Scalar load
# instruction   latency throughput
zsh: segmentation fault  src/insn_bench_aarch64
ocxtal commented 1 year ago

thank you for reporting. it reproduced on my M2 MacBook. it looks using x18 as a general-purpose register is not good, but not sure. I'll take a further look this Saturday.

ocxtal commented 1 year ago

@jdomke could you try the current master? I modified the register usage in https://github.com/ocxtal/insn_bench_aarch64/commit/7b8c2ee4cdf9cc31928063e293cbf3977810ef71 so that it won't touch x18, which is reserved for platform-specific usage. It looks it now runs on my M2 MacBook.

jdomke commented 1 year ago

@ocxtal thanks, I tested the new version and it runs for a bit, but then crashes with "abort" message, and the last line I'm seeing is: ld1.b (multi; 3 reg) 5.57 1.00 based on logs from other systems the next should be ld1.b (multi; 4 reg)

jdomke commented 1 year ago

Small update, it works when I remove all the ld1.[b|h|s|d] (multi; 4 reg) from src/load.cc, as well as all st1.b (multi; 4 regs) and st4.[b|h|s|d] (multi) from src/store.cc

ocxtal commented 1 year ago

thanks, I'll check it.