zoglauer / megalib

MEGAlib - the Medium-Energy Gamma-ray Astronomy library
http://megalibtoolkit.com
Other
41 stars 32 forks source link

Cosima shows strange freezing behaviour on execution #88

Open advaitmehla opened 6 months ago

advaitmehla commented 6 months ago

Hi Dr. Zoglauer,

I have installed the latest release of MEGAlib using the instructions provided here. This was done on a machine with Ubuntu 20.04, and the compilation proceeded smoothly with no errors.

However, on trying the example provided in the quick-start guide provided here I am running into a very weird issue - the geomega step works well, and I can view the geometry, but this is the output that I get on running cosima resource/examples/cosima/source/CrabOnly.source:

**************************************************************************
*                                                                        *
*                Cosima - the cosmic simulator of MEGAlib                *
*                                                                        *
*             This program is part of MEGAlib version 3.06.01            *
*                (C) by Andreas Zoglauer and contributors                *
*                                                                        *
*                      Master reference for MEGAlib:                     *
*            A. Zoglauer et al., NewAR 50 (7-8), 629-632, 2006           *
*                                                                        *
*            For more information about MEGAlib please visit:            *
*                        http://megalibtoolkit.com                       *
*                                                                        *
**************************************************************************

Using parameter file resource/examples/cosima/source/CrabOnly.source

*************************************************************
 Geant4 version Name: geant4-10-02-patch-03    (27-January-2017)
                      Copyright : Geant4 Collaboration
                      Reference : NIM A 506 (2003), 250-303
                            WWW : http://cern.ch/geant4
*************************************************************

Chosen physics:
Particles
G4EmLivermorePolarizedPhysics
G4RadioactiveDecay

The execution just freezes at this step (the process does not die, and there is no error), and an output file is generated as described but it has a single line which says # You can delete me.. The cosima process continues at 100% CPU utilization according to htop, and it appears to do so indefinitely - I forgot to kill the process manually once and it was still running 7 days later with no change or output.

Not sure if this is helpful, but this is dumped after I interrupt the process manually:

^CCatched signal Ctrl-C (ID=2):
Trying to cancel the run at the end of the next event...

 *** Break *** segmentation violation

===========================================================
There was a crash.
This is the entire stack trace of all threads:
===========================================================
#0  0x00007f090205fc3a in __GI___wait4 (pid=615774, stat_loc=stat_loc
entry=0x7fff94c6cae8, options=options
entry=0, usage=usage
entry=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:27
#1  0x00007f090205fbfb in __GI___waitpid (pid=<optimized out>, stat_loc=stat_loc
entry=0x7fff94c6cae8, options=options
entry=0) at waitpid.c:38
#2  0x00007f0901fcef67 in do_system (line=<optimized out>) at ../sysdeps/posix/system.c:172
#3  0x00007f0902701dce in TUnixSystem::StackTrace() () from /home/mega/MEGAlib/external/root_v6.24.08/lib/libCore.so.6.24
#4  0x00007f09026fec55 in TUnixSystem::DispatchSignals(ESignals) () from /home/mega/MEGAlib/external/root_v6.24.08/lib/libCore.so.6.24
#5  <signal handler called>
#6  0x00007f0902f55d8b in MCMain::Interrupt() () from /home/mega/MEGAlib/lib/libCosima.so
#7  0x00005590e8f0b546 in CatchSignal(int) ()
#8  <signal handler called>
#9  __GI___xstat (vers=1, name=0x7f0902132929 "/etc/localtime", buf=0x7fff94c6fef0) at ../sysdeps/unix/sysv/linux/wordsize-64/xstat.c:35
#10 0x00007f090204fca1 in __tzfile_read (file=file
entry=0x7f0902132929 "/etc/localtime", extra=extra
entry=0, extrap=extrap
entry=0x0) at tzfile.c:154
#11 0x00007f090204f055 in tzset_internal (always=<optimized out>) at tzset.c:405
#12 0x00007f090204f9cc in __tz_convert (timer=1710421487, use_localtime=1, tp=0x7f090216e4e0 <_tmbuf>) at tzset.c:577
#13 0x00007f0902c70c58 in MSystem::GetTime(long&, long&) () from /home/mega/MEGAlib/lib/libCommonMisc.so
#14 0x00007f0902cae757 in MTime::Now() () from /home/mega/MEGAlib/lib/libCommonMisc.so
#15 0x00007f0902cae77d in MTime::MTime() () from /home/mega/MEGAlib/lib/libCommonMisc.so
#16 0x00007f0902ed4bd3 in MSimEvent::MSimEvent() () from /home/mega/MEGAlib/lib/libSivan.so
#17 0x00007f0902f5b442 in MCEventAction::MCEventAction(MCParameterFile&, bool, long) () from /home/mega/MEGAlib/lib/libCosima.so
#18 0x00007f0902f565e1 in MCMain::Initialize() () from /home/mega/MEGAlib/lib/libCosima.so
#19 0x00005590e8f0b9fb in main ()
===========================================================

The lines below might hint at the cause of the crash.
You may get help by asking at the ROOT forum https://root.cern.ch/forum
Only if you are really convinced it is a bug in ROOT then please submit a
report at https://root.cern.ch/bugs Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.
===========================================================
#9  __GI___xstat (vers=1, name=0x7f0902132929 "/etc/localtime", buf=0x7fff94c6fef0) at ../sysdeps/unix/sysv/linux/wordsize-64/xstat.c:35
#10 0x00007f090204fca1 in __tzfile_read (file=file
entry=0x7f0902132929 "/etc/localtime", extra=extra
entry=0, extrap=extrap
entry=0x0) at tzfile.c:154
#11 0x00007f090204f055 in tzset_internal (always=<optimized out>) at tzset.c:405
#12 0x00007f090204f9cc in __tz_convert (timer=1710421487, use_localtime=1, tp=0x7f090216e4e0 <_tmbuf>) at tzset.c:577
#13 0x00007f0902c70c58 in MSystem::GetTime(long&, long&) () from /home/mega/MEGAlib/lib/libCommonMisc.so
#14 0x00007f0902cae757 in MTime::Now() () from /home/mega/MEGAlib/lib/libCommonMisc.so
#15 0x00007f0902cae77d in MTime::MTime() () from /home/mega/MEGAlib/lib/libCommonMisc.so
#16 0x00007f0902ed4bd3 in MSimEvent::MSimEvent() () from /home/mega/MEGAlib/lib/libSivan.so
#17 0x00007f0902f5b442 in MCEventAction::MCEventAction(MCParameterFile&, bool, long) () from /home/mega/MEGAlib/lib/libCosima.so
#18 0x00007f0902f565e1 in MCMain::Initialize() () from /home/mega/MEGAlib/lib/libCosima.so
#19 0x00005590e8f0b9fb in main ()
===========================================================

Please let me know if you have any idea what the problem could be here.

zoglauer commented 6 months ago

Hi,

Interesting. I cannot reproduce the problem here. But I remember seeing that problem before a long time ago. Can you switch to the main branch and check it if happens there too?

bash setup.sh --br=main

What CPU are you using?

Thanks, Andreas

advaitmehla commented 6 months ago

Hi,

Interesting. I cannot reproduce the problem here. But I remember seeing that problem before a long time ago. Can you switch to the main branch and check it if happens there too?

bash setup.sh --br=main

What CPU are you using?

Thanks, Andreas

Hi thanks for the quick response! The CPU is an Intel i7-8700 - I tried two separate machines with the same hardware and got the same results. I did try the main branch too, and that led to errors in compilation:

Compiling and linking ShowHistograms ...
Compiling and linking CompareHistograms ...
Compiling and linking DistanceOptimizerForEventClusterizer ...
Compiling and linking Revoxelizer ...
Compiling and linking IsotopeFileSplitter ...
Compiling and linking VariableSourceDetector ...
Compiling and linking TraFitsConverter ...
Compiling and linking ConvertMGGPOD ...
Compiling and linking ResponseToXSPEC ...
/usr/bin/ld: cannot find -lbz2: No such file or directory
collect2: error: ld returned 1 exit status
make[2]: *** [Makefile:82: /home/advait/megalib/bin/ConvertMGGPOD] Error 1
make[2]: *** Waiting for unfinished jobs....
/usr/bin/ld: cannot find -lbz2: No such file or directory
collect2: error: ld returned 1 exit status
make[2]: *** [Makefile:82: /home/advait/megalib/bin/TraFitsConverter] Error 1
/usr/bin/ld: cannot find -lbz2: No such file or directory
collect2: error: ld returned 1 exit status
make[2]: *** [Makefile:82: /home/advait/megalib/bin/ResponseToXSPEC] Error 1
make[1]: *** [Makefile:227: add] Error 2
make: *** [Makefile:121: add] Error 2
ERROR: Something went wrong while compiling MEGAlib!

Best, Advait