Open franke-biosaxs opened 1 year ago
apropos the bug:
It would be very helpful if possible to persevere with getting a reproducer - I will take a look at the _gfortran_matmul_r8 code and see if there's anything in the "known issues" that could explain it.
In the absence of a repeatable behaviour, one might look for other explanations - e.g. some uninitialised or random effect that happens to manifest on arm64 + accelerate .. but not elsewhere.
Unless your gfortran has been rebuilt since the new inlining cam e into force, I'd not expect that to make any difference ... but IDK when this change occurred.
apropos upstreaming: its down to me finding time.. or a client that wants to pay for it.
HI Ian. Thanks for your quick reply. I'm aware that as-is, this is barely a useful report. I was hoping for a "ah, yes, seen that, it is about XYZ" kind of solution. As I've got a free afternoon, I will try to cut down one of the non-trivial cases. The only "good" thing here is that if it happens, it happens every time. Nothing random about that part. Will come back in a while.
Got it. It has to be two compilation units, I couldn't make it happen in one. And secondly, the option -Og
is required. Files here.
The test case does not crash if either (1) -fexternal-blas
is omitted or (2) -Og
is omitted from FCFLAGS.
% bash -x build.sh
+ rm -f a.o b.o testcase
+ FCFLAGS='-fbacktrace -fcheck=all -g -Og -fexternal-blas'
+ gfortran -fbacktrace -fcheck=all -g -Og -fexternal-blas -c b.f90 -o b.o
+ gfortran -fbacktrace -fcheck=all -g -Og -fexternal-blas -c a.for -o a.o
+ gfortran a.o b.o -o testcase -framework Accelerate
+ ./testcase
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x104d3a677
#1 0x104d39653
#2 0x19bc22a23
#3 0x104dbd3b7
build.sh: line 10: 29699 Segmentation fault: 11 ./testcase
% lldb testcase
(lldb) target create "testcase"
Current executable set to '/Users/franke/git/atsas-testsuite-branch/build/atsas/bunch/testcase' (arm64).
(lldb) run
Process 29703 launched: '/Users/franke/git/atsas-testsuite-branch/build/atsas/bunch/testcase' (arm64)
Process 29703 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
frame #0: 0x0000000000000008
error: memory read failed for 0x0
Target 0: (testcase) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
* frame #0: 0x0000000000000008
frame #1: 0x00000001003c13b8 libgfortran.5.dylib`_gfortran_matmul_r8 + 9736
frame #2: 0x0000000100003b78 testcase`MAIN__ at a.for:13:72
frame #3: 0x0000000100003d88 testcase`main at a.for:2:9
frame #4: 0x000000019b89bf28 dyld`start + 2236
@fxcoudert - have you seen any other reports and/or do you have any comments on this?
The backtrace is as follows:
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
* frame #0: 0x0000000000000008
frame #1: 0x00000001003c6bf4 libgfortran.5.dylib`_gfortran_matmul_r8 + 6564
frame #2: 0x0000000100003b78 testcase`MAIN__ at a.for:13:72
frame #3: 0x0000000100003d88 testcase`main at a.for:2:9
frame #4: 0x0000000191667f28 dyld`start + 2236
I can make a further reduced test case:
$ cat b.f90
program testcase
implicit none
real :: rotmat(3,3), xyz(3, 3)
xyz = 0
rotmat = 0
print *, matmul(rotmat, xyz)
end program
$ gfortran -fexternal-blas -Og b.f90 -framework Accelerate && ./a.out
zsh: segmentation fault ./a.out
and there the backtrace is the same:
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x16fdff0a0)
* frame #0: 0x000000016fdff0a0
frame #1: 0x000000010029c4c8 libgfortran.5.dylib`_gfortran_matmul_r4 + 6936
frame #2: 0x0000000100003e6c a.out`MAIN__ at b.f90:6:30
frame #3: 0x0000000100003ef4 a.out`main at b.f90:7:11
frame #4: 0x0000000191667f28 dyld`start + 2236
We must be doing something wrong/weird with Accelerate, but I don't know what. I cannot reproduce when linking against openblas, though…
have any of the interfaces changed? ... or is this a case where our D.3 matters (although, I think it used to work, Dominique gave me a benchmark code early in the devt which contrasted external / internal perf.).
edit: it might also help to enumerate the optimisations in effect with "Og" and see if we can pin down which is causing the issue.
BLAS prototypes pass everything as pointer, so I don't think D.3 matters:
typedef void (*blas_call)(const char *, const char *, const int *, const int *,
const int *, const GFC_REAL_4 *, const GFC_REAL_4 *,
const int *, const GFC_REAL_4 *, const int *,
const GFC_REAL_4 *, GFC_REAL_4 *, const int *,
int, int);
Wait. Those two int
at the end are weird. I mean, they are unused arguments (they are placeholders for the length of the Fortran strings passed as char *
, whose length is known to be 1), but… Apple's prototype doesn't have those. It has:
int sgemm_(char *transa, char *transb, int *m, int *n, int *k,
float *alpha, float *a, int *lda, float *b, int *ldb,
float *beta, float *c__, int *ldc)
I've seen that before, and on most arches, this does not actually create any trouble. Could it be that on the aarch64-darwin ABI it creates an issue?
Edit: nope, I tried changing that, and it does not make the bug go away.
Probably the next step is to make the Fortran code directly call sgemm
from Accelerate, and see if that fails. If so, report it to Apple, because then it's clearly an Accelerate bug.
OK - but there is some dependency on the optimisation, correct? if we lost (or changed) the prototype somehow then that could cause issues, I suppose - since un-named parameters are passed differently from named ones.
I am on Ventura 13.4 using M1, with FX's 12.2 gfortran binaries installed. Largely things seem to work fine, up to one issue that is driving me mental: non-trivial use of MATMUL with
-fexternal-blas -framework Accelerate
. have a tendency to end up in:Using the libgfortran variants of MATMUL works fine (albeit presumably slower). So do the MATMUL in question on Intel Macs and Linux with BLAS from Intel MKL. That said, this problem does not appear for all MATMUL calls. Anecdotally with one argument being an approximation of the unity matrix (think rotations with near-or-exactly zero Euler angles). Needless to say that any trivial test program I tried to create to isolate the issue works just fine. I have observed that
print *, A
(A the calculated rotation matrix) just beforeC=MATMUL(A,B)
seems to prevent above crash, though.I would keep looking for issues on my side, but recently I found the Ventura 13.3 release notes:
Hence I wonder if the Accelerate libraries may have changed in significant enough ways that the gfortran binaries are incompatible somehow (gfortran November 2022, Ventura 13.3 from March 2023)?
I would appreciate any comments or insights on what might be going on. Thank you!
P.S. If there is anything a volunteer fluent in Fortran, but no experience in ARM, can do to help the project to get included into mainline gcc, let me know.