sccn / amica

Code for AMICA: Adaptive Mixture ICA with shared components
BSD 2-Clause "Simplified" License
23 stars 13 forks source link

RedHat 8 - forrtl: severe (174): SIGSEGV, segmentation fault occurred #38

Open vlawhern opened 1 year ago

vlawhern commented 1 year ago

Hello,

I'm currently trying to run the amica15ub binary and on RedHat 7 everything works as intended, however on RedHat 8 I get the following error:

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source   
amica15ub          000000000116EB35  Unknown               Unknown  Unknown
amica15ub          000000000116C8F7  Unknown               Unknown  Unknown
amica15ub          0000000001122954  Unknown               Unknown  Unknown
amica15ub          0000000001122766  Unknown               Unknown  Unknown
amica15ub          00000000010D4D19  Unknown               Unknown  Unknown
amica15ub          00000000010D8F90  Unknown               Unknown  Unknown
amica15ub          00000000005D21F0  Unknown               Unknown  Unknown
libnss_files.so.2  0000152B02387BD0  Unknown               Unknown  Unknown
libnss_files.so.2  0000152B02382834  Unknown               Unknown  Unknown
libnss_files.so.2  0000152B02383A0D  Unknown               Unknown  Unknown
libnss_files.so.2  0000152B02383B42  Unknown               Unknown  Unknown
amica15ub          00000000011F7263  Unknown               Unknown  Unknown
amica15ub          00000000011F7003  Unknown               Unknown  Unknown
amica15ub          00000000004EAE27  Unknown               Unknown  Unknown
amica15ub          00000000004E3B69  Unknown               Unknown  Unknown
amica15ub          00000000004D9C2D  Unknown               Unknown  Unknown
amica15ub          00000000004D1F65  Unknown               Unknown  Unknown
amica15ub          000000000048A6C4  Unknown               Unknown  Unknown
amica15ub          0000000000489DBE  Unknown               Unknown  Unknown
amica15ub          000000000046D05D  Unknown               Unknown  Unknown
amica15ub          000000000040531A  Unknown               Unknown  Unknown
amica15ub          00000000004021DE  Unknown               Unknown  Unknown
amica15ub          000000000118C1A4  Unknown               Unknown  Unknown
amica15ub          00000000004020C1  Unknown               Unknown  Unknown

This seems related to another issue thread (https://github.com/sccn/amica/issues/21), however that didn't fix my issue.

Any assistance would be appreciated.

japalmer29 commented 1 year ago

There is probably kernel change. You could follow the instructions on github to compile using Intel OneAPI and replace the binary.

Jason


From: vlawhern @.> Sent: Monday, October 17, 2022 2:52:34 PM To: sccn/amica @.> Cc: Subscribed @.***> Subject: [sccn/amica] RedHat 8 - forrtl: severe (174): SIGSEGV, segmentation fault occurred (Issue #38)

Hello,

I'm currently trying to run the amica15ub binary and on RedHat 7 everything works as intended, however on RedHat 8 I get the following error:

forrtl: severe (174): SIGSEGV, segmentation fault occurred Image PC Routine Line Source amica15ub 000000000116EB35 Unknown Unknown Unknown amica15ub 000000000116C8F7 Unknown Unknown Unknown amica15ub 0000000001122954 Unknown Unknown Unknown amica15ub 0000000001122766 Unknown Unknown Unknown amica15ub 00000000010D4D19 Unknown Unknown Unknown amica15ub 00000000010D8F90 Unknown Unknown Unknown amica15ub 00000000005D21F0 Unknown Unknown Unknown libnss_files.so.2 0000152B02387BD0 Unknown Unknown Unknown libnss_files.so.2 0000152B02382834 Unknown Unknown Unknown libnss_files.so.2 0000152B02383A0D Unknown Unknown Unknown libnss_files.so.2 0000152B02383B42 Unknown Unknown Unknown amica15ub 00000000011F7263 Unknown Unknown Unknown amica15ub 00000000011F7003 Unknown Unknown Unknown amica15ub 00000000004EAE27 Unknown Unknown Unknown amica15ub 00000000004E3B69 Unknown Unknown Unknown amica15ub 00000000004D9C2D Unknown Unknown Unknown amica15ub 00000000004D1F65 Unknown Unknown Unknown amica15ub 000000000048A6C4 Unknown Unknown Unknown amica15ub 0000000000489DBE Unknown Unknown Unknown amica15ub 000000000046D05D Unknown Unknown Unknown amica15ub 000000000040531A Unknown Unknown Unknown amica15ub 00000000004021DE Unknown Unknown Unknown amica15ub 000000000118C1A4 Unknown Unknown Unknown amica15ub 00000000004020C1 Unknown Unknown Unknown

This seems related to another issue thread (#21https://github.com/sccn/amica/issues/21), however that didn't fix my issue.

Any assistance would be appreciated.

— Reply to this email directly, view it on GitHubhttps://github.com/sccn/amica/issues/38, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACRBESVPBUJNMI3KBB4DPR3WDWN7FANCNFSM6AAAAAARHLGLFM. You are receiving this because you are subscribed to this thread.Message ID: @.***>

vlawhern commented 1 year ago

So I know that RedHat 8 has kernel 4.18 and Ubuntu 18.04 (confirmed working) has kernel 4.15, so perhaps there's a change there that makes this not work anymore.

I don't have admin rights to the RH8 system that I'm using so I'll see if I can set up the OneAPI as non-root and re-compile then..

vlawhern commented 1 year ago

OK after a lot of trial and error I'm getting fairly close to compiling, I now just get the following error:

amica15.f90:220:23:

 call random_seed(PUT = c1 * (myrank+1) * (seed+myrank+1))
                       1
Error: Size of ‘put’ argument of ‘random_seed’ intrinsic at (1) too small (2/33)

any idea on how to resolve this?

Also I tested the amica15ub binary as-is on another system with kernel 5.3 (SUSE Linux Enterprise) and it worked out of the box, so something is odd with RedHat 8 and/or the packages it ships.

vlawhern commented 1 year ago

for reference here's my compile command

 I_MPI_ROOT=~/intel/oneapi/mpi/latest mpif90 -I/home/vernon/intel/oneapi/mkl/2022.2.0/include/ -cpp -fopenmp -O3 -static -DMKL --free-line-length-0 funmod2.f90 amica15.f90 -o amica15test
japalmer29 commented 1 year ago

This line was changed recently by other contributors. The "too small" suggests to me that multiplying the seed argument by 999999 or something might solve it.


From: vlawhern @.> Sent: Monday, October 17, 2022 9:41:05 PM To: sccn/amica @.> Cc: Jason Palmer @.>; Comment @.> Subject: Re: [sccn/amica] RedHat 8 - forrtl: severe (174): SIGSEGV, segmentation fault occurred (Issue #38)

OK after a lot of trial and error I'm getting fairly close to compiling, I now just get the following error:

amica15.f90:220:23:

call random_seed(PUT = c1 (myrank+1) (seed+myrank+1))

                   1

Error: Size of ‘put’ argument of ‘random_seed’ intrinsic at (1) too small (2/33)

any idea on how to resolve this?

Also I tested the amica15ub binary as-is on another system with kernel 5.3 (SUSE Linux Enterprise) and it worked out of the box, so something is odd with RedHat 8 and/or the packages it ships.

— Reply to this email directly, view it on GitHubhttps://github.com/sccn/amica/issues/38#issuecomment-1281699557, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACRBESTOBEYQIR5OVN57MHTWDX53DANCNFSM6AAAAAARHLGLFM. You are receiving this because you commented.Message ID: @.***>

vlawhern commented 1 year ago

Have a suggested change for that line? I've tried

 call random_seed(PUT = 999999 * (c1 * (myrank+1) * (seed+myrank+1)))

and that didn't fix it..

japalmer29 commented 1 year ago

Same error? Try larger multiplier.


From: vlawhern @.> Sent: Monday, October 17, 2022 9:49:45 PM To: sccn/amica @.> Cc: Jason Palmer @.>; Comment @.> Subject: Re: [sccn/amica] RedHat 8 - forrtl: severe (174): SIGSEGV, segmentation fault occurred (Issue #38)

Have a suggested change for that line? I've tried

call random_seed(PUT = 999999 (c1 (myrank+1) * (seed+myrank+1)))

and that didn't fix it..

— Reply to this email directly, view it on GitHubhttps://github.com/sccn/amica/issues/38#issuecomment-1281704228, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACRBESQZOE6PNHYA7BYYLITWDX63TANCNFSM6AAAAAARHLGLFM. You are receiving this because you commented.Message ID: @.***>

vlawhern commented 1 year ago

same error with this:

amica15.f90:220:23:

 call random_seed(PUT = 999999999 * c1 * (myrank+1) * (seed+myrank+1))
                       1
Error: Size of ‘put’ argument of ‘random_seed’ intrinsic at (1) too small (2/33)
japalmer29 commented 1 year ago

Don't remember what c1 and seed atre, but maybe try hard coding those to some large int. Then try commenting out the random_seed line


From: vlawhern @.> Sent: Monday, October 17, 2022 9:53:25 PM To: sccn/amica @.> Cc: Jason Palmer @.>; Comment @.> Subject: Re: [sccn/amica] RedHat 8 - forrtl: severe (174): SIGSEGV, segmentation fault occurred (Issue #38)

same error with this:

amica15.f90:220:23:

call random_seed(PUT = 999999999 c1 (myrank+1) * (seed+myrank+1))

                   1

Error: Size of ‘put’ argument of ‘random_seed’ intrinsic at (1) too small (2/33)

— Reply to this email directly, view it on GitHubhttps://github.com/sccn/amica/issues/38#issuecomment-1281706279, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACRBESSGD237MOEXY572VI3WDX7JLANCNFSM6AAAAAARHLGLFM. You are receiving this because you commented.Message ID: @.***>

vlawhern commented 1 year ago

so I know nothing about Fortran, but just skimming through the file it seems it's defined here right before the random_seed line

call system_clock(c1)
call random_seed(PUT = c1 * (myrank+1) * (seed+myrank+1))

and it seems seed isn't defined?? which is weird..

japalmer29 commented 1 year ago

It is set in the function call. Did you try setting it to something aribitrary? Did you try commenting the line out?

From: vlawhern @.> Sent: Monday, October 17, 2022 10:03 PM To: sccn/amica @.> Cc: Jason Palmer @.>; Comment @.> Subject: Re: [sccn/amica] RedHat 8 - forrtl: severe (174): SIGSEGV, segmentation fault occurred (Issue #38)

so I know nothing about Fortran, but just skimming through the file it seems it's defined here right before the random_seed line

call system_clock(c1) call random_seed(PUT = c1 (myrank+1) (seed+myrank+1))

and it seems seed isn't defined?? which is weird..

— Reply to this email directly, view it on GitHub https://github.com/sccn/amica/issues/38#issuecomment-1281713921 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ACRBESQQEGVN2WTHMSQR52DWDYANRANCNFSM6AAAAAARHLGLFM . You are receiving this because you commented. https://github.com/notifications/beacon/ACRBESUSOGNPWQ3ZVH6ND7TWDYANRA5CNFSM6AAAAAARHLGLFOWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSMMVTQC.gif Message ID: @. @.> >

japalmer29 commented 1 year ago

I’m not sure if it was changed from seed to c1. Make sure seed is defined, or again hardcode it or remove it.

From: vlawhern @.> Sent: Monday, October 17, 2022 10:03 PM To: sccn/amica @.> Cc: Jason Palmer @.>; Comment @.> Subject: Re: [sccn/amica] RedHat 8 - forrtl: severe (174): SIGSEGV, segmentation fault occurred (Issue #38)

so I know nothing about Fortran, but just skimming through the file it seems it's defined here right before the random_seed line

call system_clock(c1) call random_seed(PUT = c1 (myrank+1) (seed+myrank+1))

and it seems seed isn't defined?? which is weird..

— Reply to this email directly, view it on GitHub https://github.com/sccn/amica/issues/38#issuecomment-1281713921 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ACRBESQQEGVN2WTHMSQR52DWDYANRANCNFSM6AAAAAARHLGLFM . You are receiving this because you commented. https://github.com/notifications/beacon/ACRBESUSOGNPWQ3ZVH6ND7TWDYANRA5CNFSM6AAAAAARHLGLFOWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSMMVTQC.gif Message ID: @. @.> >

japalmer29 commented 1 year ago

Someone had an issue with reproducibility and located it to the random seed, and then the code was changed. I’m not sure what the problem is, but I would suggest hardcoding the PUT = XXX to something large enough to work if possible. Otherwise there is a problem with the random_seed function itself.

From: vlawhern @.> Sent: Monday, October 17, 2022 10:03 PM To: sccn/amica @.> Cc: Jason Palmer @.>; Comment @.> Subject: Re: [sccn/amica] RedHat 8 - forrtl: severe (174): SIGSEGV, segmentation fault occurred (Issue #38)

so I know nothing about Fortran, but just skimming through the file it seems it's defined here right before the random_seed line

call system_clock(c1) call random_seed(PUT = c1 (myrank+1) (seed+myrank+1))

and it seems seed isn't defined?? which is weird..

— Reply to this email directly, view it on GitHub https://github.com/sccn/amica/issues/38#issuecomment-1281713921 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ACRBESQQEGVN2WTHMSQR52DWDYANRANCNFSM6AAAAAARHLGLFM . You are receiving this because you commented. https://github.com/notifications/beacon/ACRBESUSOGNPWQ3ZVH6ND7TWDYANRA5CNFSM6AAAAAARHLGLFOWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSMMVTQC.gif Message ID: @. @.> >

vlawhern commented 1 year ago

OK my apologies... this issue is because I wasn't using the Intel Fortran compiler.. it appears I have to manually pass in the path to the Intel Fortran compiler to mpif90 even though I set the environment variable FC=/path/to/ifort

So here's the updated compile command that I had to run as a reference:

I_MPI_ROOT=~/intel/oneapi/mpi/latest mpif90 -fc=$FC -I/home/vernon/intel/oneapi/mkl/2022.2.0/include/ -fpp -qopenmp -O3 -static-intel -mkl -DMKL funmod2.f90 amica15.f90 -o amica15test

I tried all of this on my Ubuntu machine where I have admin rights, so I'll update this when I try it on RedHat 8.

vlawhern commented 1 year ago

the compiled binary apparently appears to be dynamically linked to files in my home directory

 linux-vdso.so.1 (0x00007ffd453bd000)
        libiomp5.so => /home/vernon/intel/oneapi/compiler/2022.2.0/linux/compiler/lib/intel64_lin/libiomp5.so (0x00007f4f90aa2000)
        libmpichfort.so.0 => /usr/lib/x86_64-linux-gnu/libmpichfort.so.0 (0x00007f4f9086a000)
        libmpich.so.0 => /usr/lib/x86_64-linux-gnu/libmpich.so.0 (0x00007f4f903b4000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f4f90016000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f4f8fdf7000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f4f8fa06000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f4f90ee1000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f4f8f7ee000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f4f8f5ea000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f4f8f3e2000)
        libcr.so.0 => /usr/lib/libcr.so.0 (0x00007f4f8f1d7000)

However the amica15ub binary that's provided in the repo isn't a dynamic executable? I'm trying to compile this for other users so I'm wondering how to do this

vlawhern commented 1 year ago

So I managed to get this compiled on RedHat 8 and it runs successfully, although I had to use your older install instructions and not using Intel OneAPI (so installing the Intel Fortran Compiler directly and using the conda package manager to install the MKL libraries and MPICH2).

Thanks for your help, and sorry for all the spam.

behinger commented 1 year ago

could you share your amica15ub ubuntu static binary? I get the same segfault on Ubuntu22

vlawhern commented 1 year ago

I'm not an expert on compiling software, but it looked like I was unable to compile a static binary (it linked to files/directories unique to my system) so I'm pretty sure my binary would not work with Ubuntu 22.

In any case I think this is better addressed by @japalmer29 recompiling a new binary for newer OS's since now this problem exists across multiple machines (likely due to old libs being phased out by these distros)... I think this problem will persist the longer it goes as older distros become end-of-life.

behinger commented 1 year ago

I recompiled it for ubuntu 22 which now works on my setup. ldd amica15ub showed me no dynamic exectuable, so maybe this is useful for others.

amica15ub.zip

PS: I tried the intel-way and completly failed. MPICH-3-2 old instructions worked.

vlawhern commented 1 year ago

Your binary works on my machine as well. Thanks for sharing!

@japalmer29 this should be included in the main repo for users with newer OS's as I think old libs are being removed from newer distros causing this issue.

NinaOmejc commented 10 months ago

@behinger Thank you very much for sharing, works on my Ubuntu 22.04 as well, saved me lots of time!