lime-rt / lime

Line Modeling Engine
http://www.nbi.dk/~brinch/index.php?page=lime
Other
25 stars 25 forks source link

Random crashes of LIME with the NO molecule #85

Closed dquenard closed 8 years ago

dquenard commented 9 years ago

Dear all,

First of all, I would like to thank you again for your beautiful work on maintaining and developing LIME!

Please find in this link a folder containing all the files needed to reproduce our models:

Click on me!

We have tested several models to reproduce the emission of the NO molecule lines with LIME. To do that, we have defined a structure composed by a spherical source. We then generate a grid of points with this structure as an input for LIME. The structure parameters are the following:

Inner spherical source parameters

We have noted that LIME crashes randomly several times when running this model. We have runned this very same model 10 times, and it crashes 9 out of 10 times. We get this error message when it fails:

gsl: svd.c:149: ERROR: SVD decomposition failed to converge Default GSL error handler invoked. /home/LimePackage/lime: line 92: 21362 Aborted ./lime_$$.x

We are attaching the model.c file as well with the parameters we used for this particular test.

In addition, we have noted that the crashing time can vary significantly from one test to another when running the very same model with LIME several times (from 30 minutes to 8 hours).

Another concern we have is, can we trust the results when one of the models is working, even when other tests with the same parameters have failed?

Thank you very much for your support!

Cheers,

David Quénard PhD student - Astrochemistry Institut de Recherche en Astrophysique et Planétologie (Toulouse - France)

ParfenovS commented 9 years ago

Dear David,

I'm not from the LIME developing team but maybe I can help you. It looks like your problem is related with GSL linear equation solver used by LIME. I had similar problems with this solver and it also was in the case of relatively high number of molecular levels. I've changed the default GSL solver on the LAPACK or Intel MKL solvers (one can choose whether to use LAPACK or MKL solver). Most of changes were made in stateq.c source file. One can find my version of the LIME code with LAPACK/MKL solvers (and some notes on LAPACK and MKL installation in "Installation notes" section) here: https://github.com/ParfenovS/lime/tree/LAPACK/MKL-support I also made pull request with these code changes (one can find it here: https://github.com/lime-rt/lime/pull/65 ). I don't know whether these changes will be accepted. To use LAPACK solver you need to run my version of the LIME code with -L option. For the MKL solver you need to use -M option

I've made test runs with your input model files. The LIME code with default GSL solver stops with the error (gsl: svd.c:149:...). The code version with LAPACK solver works fine and stops after 16 iterations (the default number of global iterations) witn Min(SNR)=2.5, Median(SNR)=2.5e7. You can find the results obtained with your input files and my version of the LIME code (with LAPACK solver) here: https://dl.dropboxusercontent.com/u/83439322/model_results.zip (please note that I will delete these files with results from Dropbox at the beginning of December)

Regards

Sergey.

christianbrinch commented 9 years ago

Just a quick comment:

We have modified this collision file obtained from the LAMDA database. This is most likely the reason. The fact that it works in radex does not ensure that LIME can solve for the (modified) molecule. Do you get the same crash with the unmodified file? In any case, it is certainly related to the molecular datafile. We had the same type of crash with methanol some time ago, where we had to use an older version of the collision data in order to get it to work.

dquenard commented 9 years ago

Dear Sergey and Christian,

Thank you for your comments!

I will try to make some test with the LAPACK/MKL solver. I keep you inform in this post If I have any trouble installing/using Sergey's version.

@Christian : At first we had the same crash with the original file (and we have the same issue with CH3CN as well) but we thought it was due to the big size of the collision file. So we decided to reduce it a little bit. With the modified collision file, it works in LIME sometimes but in most of the cases it crashes.

dquenard commented 9 years ago

Dear Sergey,

I have an issue with the lapacke library, can you help me?

I have this error:

[root@titan:~/LimePackage]# lime -L example/model.c /home/soft/lapack/liblapacke.a(lapacke_dgelss_work.o): In function LAPACKE_dgelss_work': lapacke_dgelss_work.c:(.text+0x211): undefined reference todgelss_' lapacke_dgelsswork.c:(.text+0x2f9): undefined reference to `dgelss' /home/soft/lapack/liblapacke.a(lapacke_dgesv_work.o): In functionLAPACKE_dgesv_work': lapacke_dgesv_work.c:(.text+0x175): undefined reference todgesv_' lapacke_dgesvwork.c:(.text+0x23c): undefined reference to`dgesv' collect2: ld a retourné 1 code d'état d'exécution make: *\ [/root/LimePackage/lime_7037.x] Erreur 1

I do not understand where this is coming from.

Regards,

David

ParfenovS commented 9 years ago

Dear David,

Is there /home/soft/lapack/ in your LD_LIBRARY_PATH environment variable ?

dquenard commented 9 years ago

How can I know it ? Is it supposed to be written in my .bashrc or something like that ?

ParfenovS commented 9 years ago

You can check this by typing in terminal: echo $LD_LIBRARY_PATH the output will contain default paths and user defined paths with libraries that can be linked with an application. You should look for /home/soft/lapack in this output. If it is not present there then you can add following line in your .bashrc file: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/soft/lapack and restart your terminal session (you can just close the terminal and open it again).

dquenard commented 9 years ago

It seems to work now !

Thank you for explaining me !

I keep you inform here.

Cheers,

David

dquenard commented 9 years ago

Ok, I thought it worked but it is not the case, I still have the same error.

Maybe it will help you:

My .bashrc :

export PATHTOLIME=/home/dquenard/LimePackage export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${PATHTOLIME}/lib export PATH=${PATH}:$PATHTOLIME

export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/home/soft/lapack

The path where lapack and lapacke are in the server:

[dquenard@titan:~/LimePackage/example]# ls /home/soft/lapack BLAS CMakeLists.txt DOCS LAPACKE liblapack.a libtmglib.a make.inc SRC CBLAS CTestConfig.cmake INSTALL lapack.pc.in liblapacke.a LICENSE make.inc.example TESTING CMAKE CTestCustom.cmake.in lapack_build.cmake lapack_testing.py librefblas.a Makefile README [dquenard@titan:~/LimePackage/example]# [dquenard@titan:~/LimePackage/example]# [dquenard@titan:~/LimePackage/example]# ls /home/soft/ bin etc exelis gcc-4.9.0 jre jre1.7.0 lapack lapack-3.5.0 lapack-3.6.0 python-2.7.8 python-3.4.1 [dquenard@titan:~/LimePackage/example]# ls /home/soft/lapack BLAS CMakeLists.txt DOCS LAPACKE liblapack.a libtmglib.a make.inc SRC CBLAS CTestConfig.cmake INSTALL lapack.pc.in liblapacke.a LICENSE make.inc.example TESTING CMAKE CTestCustom.cmake.in lapack_build.cmake lapack_testing.py librefblas.a Makefile README [dquenard@titan:~/LimePackage/example]# ls /home/soft/lapack/LAPACKE cmake CMakeLists.txt example include lapacke.pc.in LICENSE Makefile mangling README src utils [dquenard@titan:~/LimePackage/example]# ls /home/soft/lapack/LAPACKE/include CMakeLists.txt lapacke_config.h lapacke.h lapacke_mangling.h lapacke_mangling_with_flags.h lapacke_utils.h [dquenard@titan:~/LimePackage/example]#

ParfenovS commented 9 years ago

Backup your current file named lime Try to run the LIME code with this file lime.txt Rename it from lime.txt to lime make it executable by typing in terminal chmod "u+x" lime and run the LIME code with this file.

dquenard commented 9 years ago

Thank you a lot for your help! I have a new error now:

/home/soft/lapack/liblapack.a(dormbr.o): In function dormbr_': dormbr.f:(.text+0x3a0): undefined reference to_gfortran_concat_string' dormbr.f:(.text+0x5e2): undefined reference to _gfortran_concat_string' dormbr.f:(.text+0x671): undefined reference to_gfortran_concat_string' dormbr.f:(.text+0x6e9): undefined reference to _gfortran_concat_string' /home/soft/lapack/liblapack.a(dormlq.o): In functiondormlq_': dormlq.f:(.text+0x34f): undefined reference to _gfortran_concat_string' /home/soft/lapack/liblapack.a(dormlq.o):dormlq.f:(.text+0x79e): more undefined references to_gfortran_concatstring' follow /home/soft/lapack/liblapack.a(ilaenv.o): In function `ilaenv': ilaenv.f:(.text+0x8c): undefined reference to _gfortran_compare_string' ilaenv.f:(.text+0xab): undefined reference to_gfortran_compare_string' ilaenv.f:(.text+0xca): undefined reference to _gfortran_compare_string' ilaenv.f:(.text+0xe9): undefined reference to_gfortran_compare_string' ilaenv.f:(.text+0x108): undefined reference to _gfortran_compare_string' /home/soft/lapack/liblapack.a(ilaenv.o):ilaenv.f:(.text+0x127): more undefined references to_gfortran_comparestring' follow /home/soft/lapack/liblapack.a(xerbla.o): In function `xerbla': xerbla.f:(.text+0x59): undefined reference to _gfortran_st_write' xerbla.f:(.text+0x64): undefined reference to_gfortran_string_len_trim' xerbla.f:(.text+0x76): undefined reference to _gfortran_transfer_character' xerbla.f:(.text+0x86): undefined reference to_gfortran_transfer_integer' xerbla.f:(.text+0x8e): undefined reference to _gfortran_st_write_done' xerbla.f:(.text+0x98): undefined reference to_gfortran_stopnumeric' /home/soft/lapack/liblapack.a(iparmq.o): In function `iparmq': iparmq.f:(.text+0x1cb): undefined reference to _gfortran_compare_string' iparmq.f:(.text+0x243): undefined reference to_gfortran_compare_string' iparmq.f:(.text+0x260): undefined reference to _gfortran_compare_string' iparmq.f:(.text+0x28b): undefined reference to_gfortran_compare_string' iparmq.f:(.text+0x2b5): undefined reference to `_gfortran_compare_string' collect2: ld a retourné 1 code d'état d'exécution make: *\ [/home/dquenard/LimePackage/example/lime_4447.x] Erreur 1 srun: error: n1: task 0: Exited with exit code 2

ParfenovS commented 9 years ago

Oh no. I hope that gfortran is installed in your system. Try to replace this line in the file named lime (it can be edited with any text editor): ld_flags+="-llapacke -llapack -lrefblas" on this line: ld_flags+="-llapacke -llapack -lrefblas -lgfortran"

dquenard commented 9 years ago

Yeah !

It works with the example model ! I will now try with NO.

Thank you very very much Sergey for your support :)

ParfenovS commented 9 years ago

Great! You are welcome. I hope that it will work on your machine in the case of NO file.

dquenard commented 9 years ago

Yes, it works for NO ;) Great ! Thank you a lot again !

smaret commented 8 years ago

Ian is planning to have a look at this issue, so I'm assigning this issue to him,

dquenard commented 8 years ago

OK thank you!

Since I am using the Lapack library as suggested by Sergey Parfenov instead of GSL, everything works fine.

Cheers,

David

Le 26 févr. 2016 à 21:57, Sebastien Maret notifications@github.com a écrit :

Ian is planning to have a look at this issue, so I'm assigning this issue to him,

— Reply to this email directly or view it on GitHub.

stefpols commented 8 years ago

Dear all,

we are also using LIME for doing non-LTE calculations for the molecule CH3CN. We have tested different setups and in all cases the calculation stops at some point with the mentioned error message:

gsl: svd.c:149: ERROR: SVD decomposition failed to converge Default GSL error handler invoked.

I just wanted to inform you that the problem also occurs for this molecule.

Thank you for your work on LIME and best regards,

Stefan Pols

tlunttil commented 8 years ago

There seems a couple of problems in function stateq.

First, gsl_linalg_LU_decomp overwrites the input matrix, so calling gsl_linalg_SV_decomp after that doesn't make sense.

Second, gsl_linalg_LU_det is not a good method for checking checking whether LU solution is going to fail or not.

I would remove the SVD code and only use LU. If the solution fails (=the system is near-singular) that is a sign of problems with the equations and just using another solver isn't necessarily wise.

It appears that the reason for crashes, at least in the test case I got from Ian, is that the model has an extremely low density (<<1 m^-3) in some parts of the computational domain. This means that rates for collisional transitions are very low. However, in some molecule data files, at least CH3CN, the matrix of purely radiative transitions is singular because levels with different values of K do not have radiative transitions between each other. Therefore, if the collisional rates are zero (or very low), the matrix will be (near-)singular.

As a (partial) solution, I would recommend that model files use some realistic lower limit for gas density to make sure that collisional rates are not negligible. Later the code should of course be improved to catch the dangerous cases and work around them.

allegroLeiden commented 8 years ago

Tuomas, what would you recommend as a replacement for the det() test? Given all the computing time in the world, I would calculate the condition number... which would probably best be done via SVD... but it is the GSL SVD routine which is failing anyway... plus I'm assuming SVD is pretty slow. So what is your plan B? det() is pretty fast for LU-resolved matrices, it is just the product of the diagonal. Are you thinking gsl_linalg_LU_lndet() to give some protection against underflows?

allegroLeiden commented 8 years ago

You would think there would be some funky way to estimate ||(LU)^-1|| but I cannot find one.

tlunttil commented 8 years ago

Determinant is not a good method for checking stability so gsl_linalg_LU_lndet() isn't that helpful, see for instance this discussion.

For calculating the condition number SVD would probably be the easiest method. I haven't checked why the GDL SVD fails (or would it even fail if it were used correctly). However, I'm not sure if we really need the condition number. I would just use LU solver and if that fails (because the system is near-singular), just use a LTE solution or something. If LU fails, I'm not sure if using another solver is necessary. If my theory of the cause of these crashes is correct, they only happen in more or less unphysical corner cases.

tlunttil commented 8 years ago

Also, the program doesn't need to crash if LU fails. We just need to turn off the default error handler with gsl_set_error_handler_off () and check the return value of gsl_linalg_LU_solve to see if the system was solved successfully.

imckstewart commented 8 years ago

Seems reasonable...

I have been playing with a simple 2-level system this afternoon. The results tend to confirm your theory that it is related to low transition rates between certain levels (being a combination of very low natural transition rate and low density causing low collisional transition rates). The matrix equation for this system will be

[  a | -b ] [n_0]   [0]
[---------]*[   ] = [ ]
[  1 |  1 ] [n_1]   [1]

where

a = B_01 J + C_01

and

b = A_10 + B_10 J + C_10.

The matrix inverse is

   1    [  1 |  b ]
-------*[---------]
 a + b  [ -1 |  a ]

If

a,b << 1

then the condition number is going to be approximately

k ~ 1/(a + b)

which, by pure coincidence, is also the determinant of the matrix in the present instance. :wink: For CH2, the LAMBDA database file omits the Einstein A coefficient for several transitions, so these (plus the associated B coeffs) will default to zero. In the 2-level toy system, A_10==0 would leave

n_1/n_0 = C_01/C_10,

i.e. in thermal equilibrium. Your suggestion to set such level populations to LTE therefore looks reasonable. It might also be nice and user-friendly if Lime puts out a relevant warning if the user has specified any densities below some nominal minimum for the ISM.