Open vlawhern opened 1 year ago
There is probably kernel change. You could follow the instructions on github to compile using Intel OneAPI and replace the binary.
Jason
From: vlawhern @.> Sent: Monday, October 17, 2022 2:52:34 PM To: sccn/amica @.> Cc: Subscribed @.***> Subject: [sccn/amica] RedHat 8 - forrtl: severe (174): SIGSEGV, segmentation fault occurred (Issue #38)
Hello,
I'm currently trying to run the amica15ub binary and on RedHat 7 everything works as intended, however on RedHat 8 I get the following error:
forrtl: severe (174): SIGSEGV, segmentation fault occurred Image PC Routine Line Source amica15ub 000000000116EB35 Unknown Unknown Unknown amica15ub 000000000116C8F7 Unknown Unknown Unknown amica15ub 0000000001122954 Unknown Unknown Unknown amica15ub 0000000001122766 Unknown Unknown Unknown amica15ub 00000000010D4D19 Unknown Unknown Unknown amica15ub 00000000010D8F90 Unknown Unknown Unknown amica15ub 00000000005D21F0 Unknown Unknown Unknown libnss_files.so.2 0000152B02387BD0 Unknown Unknown Unknown libnss_files.so.2 0000152B02382834 Unknown Unknown Unknown libnss_files.so.2 0000152B02383A0D Unknown Unknown Unknown libnss_files.so.2 0000152B02383B42 Unknown Unknown Unknown amica15ub 00000000011F7263 Unknown Unknown Unknown amica15ub 00000000011F7003 Unknown Unknown Unknown amica15ub 00000000004EAE27 Unknown Unknown Unknown amica15ub 00000000004E3B69 Unknown Unknown Unknown amica15ub 00000000004D9C2D Unknown Unknown Unknown amica15ub 00000000004D1F65 Unknown Unknown Unknown amica15ub 000000000048A6C4 Unknown Unknown Unknown amica15ub 0000000000489DBE Unknown Unknown Unknown amica15ub 000000000046D05D Unknown Unknown Unknown amica15ub 000000000040531A Unknown Unknown Unknown amica15ub 00000000004021DE Unknown Unknown Unknown amica15ub 000000000118C1A4 Unknown Unknown Unknown amica15ub 00000000004020C1 Unknown Unknown Unknown
This seems related to another issue thread (#21https://github.com/sccn/amica/issues/21), however that didn't fix my issue.
Any assistance would be appreciated.
— Reply to this email directly, view it on GitHubhttps://github.com/sccn/amica/issues/38, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACRBESVPBUJNMI3KBB4DPR3WDWN7FANCNFSM6AAAAAARHLGLFM. You are receiving this because you are subscribed to this thread.Message ID: @.***>
So I know that RedHat 8 has kernel 4.18 and Ubuntu 18.04 (confirmed working) has kernel 4.15, so perhaps there's a change there that makes this not work anymore.
I don't have admin rights to the RH8 system that I'm using so I'll see if I can set up the OneAPI as non-root and re-compile then..
OK after a lot of trial and error I'm getting fairly close to compiling, I now just get the following error:
amica15.f90:220:23:
call random_seed(PUT = c1 * (myrank+1) * (seed+myrank+1))
1
Error: Size of ‘put’ argument of ‘random_seed’ intrinsic at (1) too small (2/33)
any idea on how to resolve this?
Also I tested the amica15ub binary as-is on another system with kernel 5.3 (SUSE Linux Enterprise) and it worked out of the box, so something is odd with RedHat 8 and/or the packages it ships.
for reference here's my compile command
I_MPI_ROOT=~/intel/oneapi/mpi/latest mpif90 -I/home/vernon/intel/oneapi/mkl/2022.2.0/include/ -cpp -fopenmp -O3 -static -DMKL --free-line-length-0 funmod2.f90 amica15.f90 -o amica15test
This line was changed recently by other contributors. The "too small" suggests to me that multiplying the seed argument by 999999 or something might solve it.
From: vlawhern @.> Sent: Monday, October 17, 2022 9:41:05 PM To: sccn/amica @.> Cc: Jason Palmer @.>; Comment @.> Subject: Re: [sccn/amica] RedHat 8 - forrtl: severe (174): SIGSEGV, segmentation fault occurred (Issue #38)
OK after a lot of trial and error I'm getting fairly close to compiling, I now just get the following error:
amica15.f90:220:23:
call random_seed(PUT = c1 (myrank+1) (seed+myrank+1))
1
Error: Size of ‘put’ argument of ‘random_seed’ intrinsic at (1) too small (2/33)
any idea on how to resolve this?
Also I tested the amica15ub binary as-is on another system with kernel 5.3 (SUSE Linux Enterprise) and it worked out of the box, so something is odd with RedHat 8 and/or the packages it ships.
— Reply to this email directly, view it on GitHubhttps://github.com/sccn/amica/issues/38#issuecomment-1281699557, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACRBESTOBEYQIR5OVN57MHTWDX53DANCNFSM6AAAAAARHLGLFM. You are receiving this because you commented.Message ID: @.***>
Have a suggested change for that line? I've tried
call random_seed(PUT = 999999 * (c1 * (myrank+1) * (seed+myrank+1)))
and that didn't fix it..
Same error? Try larger multiplier.
From: vlawhern @.> Sent: Monday, October 17, 2022 9:49:45 PM To: sccn/amica @.> Cc: Jason Palmer @.>; Comment @.> Subject: Re: [sccn/amica] RedHat 8 - forrtl: severe (174): SIGSEGV, segmentation fault occurred (Issue #38)
Have a suggested change for that line? I've tried
call random_seed(PUT = 999999 (c1 (myrank+1) * (seed+myrank+1)))
and that didn't fix it..
— Reply to this email directly, view it on GitHubhttps://github.com/sccn/amica/issues/38#issuecomment-1281704228, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACRBESQZOE6PNHYA7BYYLITWDX63TANCNFSM6AAAAAARHLGLFM. You are receiving this because you commented.Message ID: @.***>
same error with this:
amica15.f90:220:23:
call random_seed(PUT = 999999999 * c1 * (myrank+1) * (seed+myrank+1))
1
Error: Size of ‘put’ argument of ‘random_seed’ intrinsic at (1) too small (2/33)
Don't remember what c1 and seed atre, but maybe try hard coding those to some large int. Then try commenting out the random_seed line
From: vlawhern @.> Sent: Monday, October 17, 2022 9:53:25 PM To: sccn/amica @.> Cc: Jason Palmer @.>; Comment @.> Subject: Re: [sccn/amica] RedHat 8 - forrtl: severe (174): SIGSEGV, segmentation fault occurred (Issue #38)
same error with this:
amica15.f90:220:23:
call random_seed(PUT = 999999999 c1 (myrank+1) * (seed+myrank+1))
1
Error: Size of ‘put’ argument of ‘random_seed’ intrinsic at (1) too small (2/33)
— Reply to this email directly, view it on GitHubhttps://github.com/sccn/amica/issues/38#issuecomment-1281706279, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACRBESSGD237MOEXY572VI3WDX7JLANCNFSM6AAAAAARHLGLFM. You are receiving this because you commented.Message ID: @.***>
so I know nothing about Fortran, but just skimming through the file it seems it's defined here right before the random_seed line
call system_clock(c1)
call random_seed(PUT = c1 * (myrank+1) * (seed+myrank+1))
and it seems seed isn't defined?? which is weird..
It is set in the function call. Did you try setting it to something aribitrary? Did you try commenting the line out?
From: vlawhern @.> Sent: Monday, October 17, 2022 10:03 PM To: sccn/amica @.> Cc: Jason Palmer @.>; Comment @.> Subject: Re: [sccn/amica] RedHat 8 - forrtl: severe (174): SIGSEGV, segmentation fault occurred (Issue #38)
so I know nothing about Fortran, but just skimming through the file it seems it's defined here right before the random_seed line
call system_clock(c1) call random_seed(PUT = c1 (myrank+1) (seed+myrank+1))
and it seems seed isn't defined?? which is weird..
— Reply to this email directly, view it on GitHub https://github.com/sccn/amica/issues/38#issuecomment-1281713921 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ACRBESQQEGVN2WTHMSQR52DWDYANRANCNFSM6AAAAAARHLGLFM . You are receiving this because you commented. https://github.com/notifications/beacon/ACRBESUSOGNPWQ3ZVH6ND7TWDYANRA5CNFSM6AAAAAARHLGLFOWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSMMVTQC.gif Message ID: @. @.> >
I’m not sure if it was changed from seed to c1. Make sure seed is defined, or again hardcode it or remove it.
From: vlawhern @.> Sent: Monday, October 17, 2022 10:03 PM To: sccn/amica @.> Cc: Jason Palmer @.>; Comment @.> Subject: Re: [sccn/amica] RedHat 8 - forrtl: severe (174): SIGSEGV, segmentation fault occurred (Issue #38)
so I know nothing about Fortran, but just skimming through the file it seems it's defined here right before the random_seed line
call system_clock(c1) call random_seed(PUT = c1 (myrank+1) (seed+myrank+1))
and it seems seed isn't defined?? which is weird..
— Reply to this email directly, view it on GitHub https://github.com/sccn/amica/issues/38#issuecomment-1281713921 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ACRBESQQEGVN2WTHMSQR52DWDYANRANCNFSM6AAAAAARHLGLFM . You are receiving this because you commented. https://github.com/notifications/beacon/ACRBESUSOGNPWQ3ZVH6ND7TWDYANRA5CNFSM6AAAAAARHLGLFOWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSMMVTQC.gif Message ID: @. @.> >
Someone had an issue with reproducibility and located it to the random seed, and then the code was changed. I’m not sure what the problem is, but I would suggest hardcoding the PUT = XXX to something large enough to work if possible. Otherwise there is a problem with the random_seed function itself.
From: vlawhern @.> Sent: Monday, October 17, 2022 10:03 PM To: sccn/amica @.> Cc: Jason Palmer @.>; Comment @.> Subject: Re: [sccn/amica] RedHat 8 - forrtl: severe (174): SIGSEGV, segmentation fault occurred (Issue #38)
so I know nothing about Fortran, but just skimming through the file it seems it's defined here right before the random_seed line
call system_clock(c1) call random_seed(PUT = c1 (myrank+1) (seed+myrank+1))
and it seems seed isn't defined?? which is weird..
— Reply to this email directly, view it on GitHub https://github.com/sccn/amica/issues/38#issuecomment-1281713921 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ACRBESQQEGVN2WTHMSQR52DWDYANRANCNFSM6AAAAAARHLGLFM . You are receiving this because you commented. https://github.com/notifications/beacon/ACRBESUSOGNPWQ3ZVH6ND7TWDYANRA5CNFSM6AAAAAARHLGLFOWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSMMVTQC.gif Message ID: @. @.> >
OK my apologies... this issue is because I wasn't using the Intel Fortran compiler.. it appears I have to manually pass in the path to the Intel Fortran compiler to mpif90
even though I set the environment variable FC=/path/to/ifort
So here's the updated compile command that I had to run as a reference:
I_MPI_ROOT=~/intel/oneapi/mpi/latest mpif90 -fc=$FC -I/home/vernon/intel/oneapi/mkl/2022.2.0/include/ -fpp -qopenmp -O3 -static-intel -mkl -DMKL funmod2.f90 amica15.f90 -o amica15test
I tried all of this on my Ubuntu machine where I have admin rights, so I'll update this when I try it on RedHat 8.
the compiled binary apparently appears to be dynamically linked to files in my home directory
linux-vdso.so.1 (0x00007ffd453bd000)
libiomp5.so => /home/vernon/intel/oneapi/compiler/2022.2.0/linux/compiler/lib/intel64_lin/libiomp5.so (0x00007f4f90aa2000)
libmpichfort.so.0 => /usr/lib/x86_64-linux-gnu/libmpichfort.so.0 (0x00007f4f9086a000)
libmpich.so.0 => /usr/lib/x86_64-linux-gnu/libmpich.so.0 (0x00007f4f903b4000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f4f90016000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f4f8fdf7000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f4f8fa06000)
/lib64/ld-linux-x86-64.so.2 (0x00007f4f90ee1000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f4f8f7ee000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f4f8f5ea000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f4f8f3e2000)
libcr.so.0 => /usr/lib/libcr.so.0 (0x00007f4f8f1d7000)
However the amica15ub binary that's provided in the repo isn't a dynamic executable? I'm trying to compile this for other users so I'm wondering how to do this
So I managed to get this compiled on RedHat 8 and it runs successfully, although I had to use your older install instructions and not using Intel OneAPI (so installing the Intel Fortran Compiler directly and using the conda
package manager to install the MKL libraries and MPICH2).
Thanks for your help, and sorry for all the spam.
could you share your amica15ub ubuntu static binary? I get the same segfault on Ubuntu22
I'm not an expert on compiling software, but it looked like I was unable to compile a static binary (it linked to files/directories unique to my system) so I'm pretty sure my binary would not work with Ubuntu 22.
In any case I think this is better addressed by @japalmer29 recompiling a new binary for newer OS's since now this problem exists across multiple machines (likely due to old libs being phased out by these distros)... I think this problem will persist the longer it goes as older distros become end-of-life.
I recompiled it for ubuntu 22 which now works on my setup. ldd amica15ub
showed me no dynamic exectuable, so maybe this is useful for others.
PS: I tried the intel-way and completly failed. MPICH-3-2 old instructions worked.
Your binary works on my machine as well. Thanks for sharing!
@japalmer29 this should be included in the main repo for users with newer OS's as I think old libs are being removed from newer distros causing this issue.
@behinger Thank you very much for sharing, works on my Ubuntu 22.04 as well, saved me lots of time!
Hello,
I'm currently trying to run the amica15ub binary and on RedHat 7 everything works as intended, however on RedHat 8 I get the following error:
This seems related to another issue thread (https://github.com/sccn/amica/issues/21), however that didn't fix my issue.
Any assistance would be appreciated.