Closed dquenard closed 8 years ago
Dear David,
I'm not from the LIME developing team but maybe I can help you. It looks like your problem is related with GSL linear equation solver used by LIME. I had similar problems with this solver and it also was in the case of relatively high number of molecular levels. I've changed the default GSL solver on the LAPACK or Intel MKL solvers (one can choose whether to use LAPACK or MKL solver). Most of changes were made in stateq.c source file. One can find my version of the LIME code with LAPACK/MKL solvers (and some notes on LAPACK and MKL installation in "Installation notes" section) here: https://github.com/ParfenovS/lime/tree/LAPACK/MKL-support I also made pull request with these code changes (one can find it here: https://github.com/lime-rt/lime/pull/65 ). I don't know whether these changes will be accepted. To use LAPACK solver you need to run my version of the LIME code with -L option. For the MKL solver you need to use -M option
I've made test runs with your input model files. The LIME code with default GSL solver stops with the error (gsl: svd.c:149:...). The code version with LAPACK solver works fine and stops after 16 iterations (the default number of global iterations) witn Min(SNR)=2.5, Median(SNR)=2.5e7. You can find the results obtained with your input files and my version of the LIME code (with LAPACK solver) here: https://dl.dropboxusercontent.com/u/83439322/model_results.zip (please note that I will delete these files with results from Dropbox at the beginning of December)
Regards
Sergey.
Just a quick comment:
We have modified this collision file obtained from the LAMDA database. This is most likely the reason. The fact that it works in radex does not ensure that LIME can solve for the (modified) molecule. Do you get the same crash with the unmodified file? In any case, it is certainly related to the molecular datafile. We had the same type of crash with methanol some time ago, where we had to use an older version of the collision data in order to get it to work.
Dear Sergey and Christian,
Thank you for your comments!
I will try to make some test with the LAPACK/MKL solver. I keep you inform in this post If I have any trouble installing/using Sergey's version.
@Christian : At first we had the same crash with the original file (and we have the same issue with CH3CN as well) but we thought it was due to the big size of the collision file. So we decided to reduce it a little bit. With the modified collision file, it works in LIME sometimes but in most of the cases it crashes.
Dear Sergey,
I have an issue with the lapacke library, can you help me?
I have this error:
[root@titan:~/LimePackage]# lime -L example/model.c
/home/soft/lapack/liblapacke.a(lapacke_dgelss_work.o): In function LAPACKE_dgelss_work': lapacke_dgelss_work.c:(.text+0x211): undefined reference to
dgelss_'
lapacke_dgelsswork.c:(.text+0x2f9): undefined reference to `dgelss'
/home/soft/lapack/liblapacke.a(lapacke_dgesv_work.o): In functionLAPACKE_dgesv_work': lapacke_dgesv_work.c:(.text+0x175): undefined reference to
dgesv_'
lapacke_dgesvwork.c:(.text+0x23c): undefined reference to`dgesv'
collect2: ld a retourné 1 code d'état d'exécution
make: *\ [/root/LimePackage/lime_7037.x] Erreur 1
I do not understand where this is coming from.
Regards,
David
Dear David,
Is there /home/soft/lapack/ in your LD_LIBRARY_PATH environment variable ?
How can I know it ? Is it supposed to be written in my .bashrc or something like that ?
You can check this by typing in terminal: echo $LD_LIBRARY_PATH the output will contain default paths and user defined paths with libraries that can be linked with an application. You should look for /home/soft/lapack in this output. If it is not present there then you can add following line in your .bashrc file: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/soft/lapack and restart your terminal session (you can just close the terminal and open it again).
It seems to work now !
Thank you for explaining me !
I keep you inform here.
Cheers,
David
Ok, I thought it worked but it is not the case, I still have the same error.
Maybe it will help you:
My .bashrc :
export PATHTOLIME=/home/dquenard/LimePackage export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${PATHTOLIME}/lib export PATH=${PATH}:$PATHTOLIME
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/home/soft/lapack
The path where lapack and lapacke are in the server:
[dquenard@titan:~/LimePackage/example]# ls /home/soft/lapack BLAS CMakeLists.txt DOCS LAPACKE liblapack.a libtmglib.a make.inc SRC CBLAS CTestConfig.cmake INSTALL lapack.pc.in liblapacke.a LICENSE make.inc.example TESTING CMAKE CTestCustom.cmake.in lapack_build.cmake lapack_testing.py librefblas.a Makefile README [dquenard@titan:~/LimePackage/example]# [dquenard@titan:~/LimePackage/example]# [dquenard@titan:~/LimePackage/example]# ls /home/soft/ bin etc exelis gcc-4.9.0 jre jre1.7.0 lapack lapack-3.5.0 lapack-3.6.0 python-2.7.8 python-3.4.1 [dquenard@titan:~/LimePackage/example]# ls /home/soft/lapack BLAS CMakeLists.txt DOCS LAPACKE liblapack.a libtmglib.a make.inc SRC CBLAS CTestConfig.cmake INSTALL lapack.pc.in liblapacke.a LICENSE make.inc.example TESTING CMAKE CTestCustom.cmake.in lapack_build.cmake lapack_testing.py librefblas.a Makefile README [dquenard@titan:~/LimePackage/example]# ls /home/soft/lapack/LAPACKE cmake CMakeLists.txt example include lapacke.pc.in LICENSE Makefile mangling README src utils [dquenard@titan:~/LimePackage/example]# ls /home/soft/lapack/LAPACKE/include CMakeLists.txt lapacke_config.h lapacke.h lapacke_mangling.h lapacke_mangling_with_flags.h lapacke_utils.h [dquenard@titan:~/LimePackage/example]#
Backup your current file named lime Try to run the LIME code with this file lime.txt Rename it from lime.txt to lime make it executable by typing in terminal chmod "u+x" lime and run the LIME code with this file.
Thank you a lot for your help! I have a new error now:
/home/soft/lapack/liblapack.a(dormbr.o): In function dormbr_': dormbr.f:(.text+0x3a0): undefined reference to
_gfortran_concat_string'
dormbr.f:(.text+0x5e2): undefined reference to _gfortran_concat_string' dormbr.f:(.text+0x671): undefined reference to
_gfortran_concat_string'
dormbr.f:(.text+0x6e9): undefined reference to _gfortran_concat_string' /home/soft/lapack/liblapack.a(dormlq.o): In function
dormlq_':
dormlq.f:(.text+0x34f): undefined reference to _gfortran_concat_string' /home/soft/lapack/liblapack.a(dormlq.o):dormlq.f:(.text+0x79e): more undefined references to
_gfortran_concatstring' follow
/home/soft/lapack/liblapack.a(ilaenv.o): In function `ilaenv':
ilaenv.f:(.text+0x8c): undefined reference to _gfortran_compare_string' ilaenv.f:(.text+0xab): undefined reference to
_gfortran_compare_string'
ilaenv.f:(.text+0xca): undefined reference to _gfortran_compare_string' ilaenv.f:(.text+0xe9): undefined reference to
_gfortran_compare_string'
ilaenv.f:(.text+0x108): undefined reference to _gfortran_compare_string' /home/soft/lapack/liblapack.a(ilaenv.o):ilaenv.f:(.text+0x127): more undefined references to
_gfortran_comparestring' follow
/home/soft/lapack/liblapack.a(xerbla.o): In function `xerbla':
xerbla.f:(.text+0x59): undefined reference to _gfortran_st_write' xerbla.f:(.text+0x64): undefined reference to
_gfortran_string_len_trim'
xerbla.f:(.text+0x76): undefined reference to _gfortran_transfer_character' xerbla.f:(.text+0x86): undefined reference to
_gfortran_transfer_integer'
xerbla.f:(.text+0x8e): undefined reference to _gfortran_st_write_done' xerbla.f:(.text+0x98): undefined reference to
_gfortran_stopnumeric'
/home/soft/lapack/liblapack.a(iparmq.o): In function `iparmq':
iparmq.f:(.text+0x1cb): undefined reference to _gfortran_compare_string' iparmq.f:(.text+0x243): undefined reference to
_gfortran_compare_string'
iparmq.f:(.text+0x260): undefined reference to _gfortran_compare_string' iparmq.f:(.text+0x28b): undefined reference to
_gfortran_compare_string'
iparmq.f:(.text+0x2b5): undefined reference to `_gfortran_compare_string'
collect2: ld a retourné 1 code d'état d'exécution
make: *\ [/home/dquenard/LimePackage/example/lime_4447.x] Erreur 1
srun: error: n1: task 0: Exited with exit code 2
Oh no. I hope that gfortran is installed in your system. Try to replace this line in the file named lime (it can be edited with any text editor): ld_flags+="-llapacke -llapack -lrefblas" on this line: ld_flags+="-llapacke -llapack -lrefblas -lgfortran"
Yeah !
It works with the example model ! I will now try with NO.
Thank you very very much Sergey for your support :)
Great! You are welcome. I hope that it will work on your machine in the case of NO file.
Yes, it works for NO ;) Great ! Thank you a lot again !
Ian is planning to have a look at this issue, so I'm assigning this issue to him,
OK thank you!
Since I am using the Lapack library as suggested by Sergey Parfenov instead of GSL, everything works fine.
Cheers,
David
Le 26 févr. 2016 à 21:57, Sebastien Maret notifications@github.com a écrit :
Ian is planning to have a look at this issue, so I'm assigning this issue to him,
— Reply to this email directly or view it on GitHub.
Dear all,
we are also using LIME for doing non-LTE calculations for the molecule CH3CN. We have tested different setups and in all cases the calculation stops at some point with the mentioned error message:
gsl: svd.c:149: ERROR: SVD decomposition failed to converge Default GSL error handler invoked.
I just wanted to inform you that the problem also occurs for this molecule.
Thank you for your work on LIME and best regards,
Stefan Pols
There seems a couple of problems in function stateq.
First, gsl_linalg_LU_decomp overwrites the input matrix, so calling gsl_linalg_SV_decomp after that doesn't make sense.
Second, gsl_linalg_LU_det is not a good method for checking checking whether LU solution is going to fail or not.
I would remove the SVD code and only use LU. If the solution fails (=the system is near-singular) that is a sign of problems with the equations and just using another solver isn't necessarily wise.
It appears that the reason for crashes, at least in the test case I got from Ian, is that the model has an extremely low density (<<1 m^-3) in some parts of the computational domain. This means that rates for collisional transitions are very low. However, in some molecule data files, at least CH3CN, the matrix of purely radiative transitions is singular because levels with different values of K do not have radiative transitions between each other. Therefore, if the collisional rates are zero (or very low), the matrix will be (near-)singular.
As a (partial) solution, I would recommend that model files use some realistic lower limit for gas density to make sure that collisional rates are not negligible. Later the code should of course be improved to catch the dangerous cases and work around them.
Tuomas, what would you recommend as a replacement for the det() test? Given all the computing time in the world, I would calculate the condition number... which would probably best be done via SVD... but it is the GSL SVD routine which is failing anyway... plus I'm assuming SVD is pretty slow. So what is your plan B? det() is pretty fast for LU-resolved matrices, it is just the product of the diagonal. Are you thinking gsl_linalg_LU_lndet()
to give some protection against underflows?
You would think there would be some funky way to estimate ||(LU)^-1|| but I cannot find one.
Determinant is not a good method for checking stability so gsl_linalg_LU_lndet() isn't that helpful, see for instance this discussion.
For calculating the condition number SVD would probably be the easiest method. I haven't checked why the GDL SVD fails (or would it even fail if it were used correctly). However, I'm not sure if we really need the condition number. I would just use LU solver and if that fails (because the system is near-singular), just use a LTE solution or something. If LU fails, I'm not sure if using another solver is necessary. If my theory of the cause of these crashes is correct, they only happen in more or less unphysical corner cases.
Also, the program doesn't need to crash if LU fails. We just need to turn off the default error handler with gsl_set_error_handler_off () and check the return value of gsl_linalg_LU_solve to see if the system was solved successfully.
Seems reasonable...
I have been playing with a simple 2-level system this afternoon. The results tend to confirm your theory that it is related to low transition rates between certain levels (being a combination of very low natural transition rate and low density causing low collisional transition rates). The matrix equation for this system will be
[ a | -b ] [n_0] [0]
[---------]*[ ] = [ ]
[ 1 | 1 ] [n_1] [1]
where
a = B_01 J + C_01
and
b = A_10 + B_10 J + C_10.
The matrix inverse is
1 [ 1 | b ]
-------*[---------]
a + b [ -1 | a ]
If
a,b << 1
then the condition number is going to be approximately
k ~ 1/(a + b)
which, by pure coincidence, is also the determinant of the matrix in the present instance. :wink: For CH2, the LAMBDA database file omits the Einstein A coefficient for several transitions, so these (plus the associated B coeffs) will default to zero. In the 2-level toy system, A_10==0
would leave
n_1/n_0 = C_01/C_10,
i.e. in thermal equilibrium. Your suggestion to set such level populations to LTE therefore looks reasonable. It might also be nice and user-friendly if Lime puts out a relevant warning if the user has specified any densities below some nominal minimum for the ISM.
Dear all,
First of all, I would like to thank you again for your beautiful work on maintaining and developing LIME!
Please find in this link a folder containing all the files needed to reproduce our models:
Click on me!
We have tested several models to reproduce the emission of the NO molecule lines with LIME. To do that, we have defined a structure composed by a spherical source. We then generate a grid of points with this structure as an input for LIME. The structure parameters are the following:
Inner spherical source parameters
We have noted that LIME crashes randomly several times when running this model. We have runned this very same model 10 times, and it crashes 9 out of 10 times. We get this error message when it fails:
gsl: svd.c:149: ERROR: SVD decomposition failed to converge Default GSL error handler invoked. /home/LimePackage/lime: line 92: 21362 Aborted ./lime_$$.x
We are attaching the model.c file as well with the parameters we used for this particular test.
In addition, we have noted that the crashing time can vary significantly from one test to another when running the very same model with LIME several times (from 30 minutes to 8 hours).
Another concern we have is, can we trust the results when one of the models is working, even when other tests with the same parameters have failed?
Thank you very much for your support!
Cheers,
David Quénard PhD student - Astrochemistry Institut de Recherche en Astrophysique et Planétologie (Toulouse - France)