kgullikson88 / Telluric-Fitter

Telluric fitting made easy
http://telfit.readthedocs.org/en/latest/
MIT License
20 stars 17 forks source link

floating-point exception loop of death #5

Closed mrawls closed 9 years ago

mrawls commented 9 years ago

Hi there,

I was able to install Telluric-Fitter on OS X Yosemite running the Ureka Python 2.7 with no apparent issues. Yay! However, when I the MakeLibrary.py example, I get this output:

Running exec: lblrtm
Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
STOP  LBLRTM EXIT 
real    0m6.753s
user    0m2.828s
sys 0m3.771s

It will repeat this several times when I run the MakeLibrary.py example, and after a few times (with varying real/user/sys values) it will print "All done! Output Transmission spectrum is located in the file below: [long filename which does indeed exist and have numbers in it]." But then it falls into some kind of never-ending loop, starting again with "Running exec: lblrtm" and overwriting (adding to?) an identical outfile every few times. I finally killed it after 15 min or so, tried the Fit.py example with nearly identical results:

Running exec: lblrtm
Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
STOP  LBLRTM EXIT 
real    0m8.637s
user    0m4.058s
sys 0m4.175s
Adjusting wavelength calibration by at most 0.001264761 nm
Fitting Resolution
Optimal resolution found at R =  66870.7176604
Fraction of points used in X^2 evaluation:  1.0
X^2 =  0.226303802101
Parameter       Value       Fitting?    Bounds
-------------   -----       -----       -----
temperature     2.92500E+02 True        287.5 - 297.5
h2o             5.40000E+01 True        1 - 99
o2              2.12000E+05 True        50000 - 1e+06

As a bonus, running Fit.py created an lnfl_run.log file which said the following:

Note: The following floating-point exceptions are signalling: IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
STOP  LINFIL COMPLETE 
 skipping over header records on TAPE1

Then I gave up and came over here. If it matters, I'm using the homebrew gfortran that comes with gcc. Thanks for any pointers.

kgullikson88 commented 9 years ago

Hmm, I have never seen those errors. Could you reply with the TAPE5 file contents? It should be located in $TELLURICMODELING/rundir1/TAPE5. (or rundir2, rundir3, etc...)

I will try to run lblrtm on my system with the TAPE5 to check if it works. The fact that an lnfl_run.log appeared is kind of suspicious; lnfl should only run when you install the package. I suspect that might not have actually installed correctly but the error didn't cause the installation to crash and it got lost in all the other output during installation.

mrawls commented 9 years ago

Hmm, perhaps the logfile appeared during installation and I only noticed it later. That might make sense; now that you say that, I suspect my installation might not have worked despite what the output said. What fun FORTRAN can be... a few different TAPE5s are here for you to look at: https://www.dropbox.com/sh/92h25k2j9m9o229/AAD1wVXe4Q5IgQkLDeaKriDEa?dl=0

kgullikson88 commented 9 years ago

Alright, the TAPE5 file worked fine on my computer. That is at least a good sign. Lets try the lnfl compilation and binary line list creation again:

I suspect it won't all work since that is exactly what the setup script does, but hopefully we will be able to see what is failing.

My TAPE5 file for lnfl (not supposed to look the same as the ones you send me)

$ lnfl Tape 5 file
    3333.0   31250.0
1111111111111111111111111111111111111111
mrawls commented 9 years ago

The "make" command seems to work:

[~/Astronomy/TelFit-1.2/lnfl/build]$ make -f make_lnfl osxGNUsgl

-----------------
  lnfl_v2.6_OS_X_gnu_sgl Makefile
-----------------

This Makefile was designed for the OS_X platform.

It uses the gfortran compiler, with the following options:
      -Wall -frecord-marker=4

The source files used are as follows:

      lnfl.f util_gfortran.f

lnfl_v2.6_OS_X_gnu_sgl make in progress ...

=================
  Makefile done
=================

However, the executable osxGNUsgl does not appear in the working directory. Did I miss something? Thanks so much for your help with this.

kgullikson88 commented 9 years ago

It should appear in the parent directory (~/Astronomy/TelFit-1.2/lnfl)

mrawls commented 9 years ago

Sorry, that's what I get for responding in two minutes before running off to a talk!

I found the lnfl executable and ran it with your TAPE5 file. It gives essentially the same floating point exceptions as before.

[~/Astronomy/TelFit-1.2/lnfl]$ ./lnfl_v2.6_OS_X_gnu_sgl 
 skipping over header records on TAPE1
Note: The following floating-point exceptions are signalling: IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
STOP  LINFIL COMPLETE 

When I run lblrtm from any of the rundir folders, it also crashes quite quickly.

[~/Astronomy/TelFit-1.2/rundir1]$ ./lblrtm 
Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
STOP  LBLRTM EXIT 
kgullikson88 commented 9 years ago

Do you have any other fortran compilers that you could try?

I could also try giving you my output from lnfl, so you could try it and see if it is indeed the problem. I won't be able to do that until tonight, though.

mrawls commented 9 years ago

I have another fortran compiler buried somewhere... I will dig it up and try it later today or tomorrow. If you could please send me your lnfl output that would be great too. Thanks again.

kgullikson88 commented 9 years ago

Here is a TAPE3 file that I made. I am not positive these files are cross platform, but we can give it a shot.

mrawls commented 9 years ago

Sorry for the delay. I've tried a few things with no luck:

1) my TAPE3 & TAPE5 files: the original floating-point exceptions

[~/Astronomy/TelFit-1.2/rundir2]$ ./lblrtm 
Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
STOP  LBLRTM EXIT 

2) your TAPE3 file & my TAPE5 file: same as (1) 3) my TAPE3 file & your TAPE5 file: same as (4) 4) your TAPE3 & TAPE5 files: a new error

[~/Astronomy/TelFit-1.2/rundir1]$ ./lblrtm 
At line 4560 of file ../src/lblrtm.f90 (unit = 55, file = 'TAPE5')
Fortran runtime error: End of file

Running lblrtm does create a TAPE6 file in all cases. It creates TAPE7,9,10,11,12 in addition to TAPE6 in the floating-point exception cases. I'm not sure how to use these files with your python scripts, though, or how to check if they are correct. I put them here for case (2): https://www.dropbox.com/sh/85rpjcsbq95klw5/AADX2rFmmrIZsEmCHAocSovLa?dl=0

I'm still working on tracking down a different fortran to try. Just to check, are you sure the TAPE5 you pasted above is formatted correctly?

kgullikson88 commented 9 years ago

The TAPE5 I gave you is for lnfl (which makes TAPE3, the binary line list), not lblrtm (which actually generates the model atmosphere). If you tried to use it for lblrtm, an end of file error sounds about right.

You can take a look at the output with my modules as follows:

import MakeModel
import pylab
directory='full/path/to/your/rundir/directory/'
modeler = MakeModel.Modeler()
v, spec = modeler.ReadTAPE12(directory)

pylab.plot(v, spec)
pylab.xlabel(r'Wavenumber (cm$^{-1}$)')
pylab.ylabel('Transmission')

The x-axis will be in wavenumber (cm^-1). My code usually does all this under the hood, but you can at least check the outputs to see if you get something that looks right.

It looks like TAPE6 is a debug log, so has a whole bunch of information in there. It might have something useful... sorry I'm not more help, this is a weird bug!

mrawls commented 9 years ago

Finally fixed it! For some reason, gfortran 4.9 and 5.0 both give floating-point exceptions, but gfortran 4.8 doesn't. Now the example Fit.py iterates as it should and makes a lovely plot at the end! I hope this helps anyone else who may encounter this issue. Thanks again for your help troubleshooting.

kgullikson88 commented 9 years ago

Great! I will add this to the FAQ section on my website.

jlamoure commented 7 years ago

I am experiencing this floating-point issue as well. There's gfortran-4.8 and gfortran-4.9 in the Debian Jessie repos, but only 4.9 compiles the lnfl sucessfully. Here is the failed make for when only 4.8 is installed:

$ ~/.TelFit/lnfl/build > make -f make_lnfl linuxGNUsgl

-----------------
  lnfl_v2.6_linux_gnu_sgl Makefile
-----------------

This Makefile was designed for the linux platform.

It uses the gfortran compiler, with the following options:
      -Wall -frecord-marker=4

The source files used are as follows:

      lnfl.f util_gfortran.f

lnfl_v2.6_linux_gnu_sgl make in progress ...

/bin/sh: 6: gfortran: not found
make_lnfl:52: recipe for target 'check' failed
make[1]: *** [check] Error 127
makefile.common:213: recipe for target 'linuxGNUsgl' failed
make: *** [linuxGNUsgl] Error 2

Should I get the compiler from somewhere else?

Edit: Got it compiled with a symbolic link like so

$ > sudo ln -s /usr/bin/gfortran-4.8 /usr/bin/gfortran

And this is what the Fit.py output looks like. The lblrtm still stops but without [The following floating-point exceptions are signalling: IEEE_INVALID_FLAG IEEE_UNDERFLOW_FLAG IEEE_DENORMAL].

/usr/local/lib/python2.7/dist-packages/pysynphot/locations.py:119: UserWarning: Extinction files should be moved to $PYSYN_CDBS/extinction for compatibility with future versions of pysynphot.
  warnings.warn('Extinction files should be moved to '
Parameter       Value       Fitting?    Bounds
-------------   -----       -----       -----
h2o             5.40000E+01 True        1 - 99

Running exec: lblrtm

STOP  LBLRTM EXIT 
1.37user 1.20system 0:02.57elapsed 100%CPU (0avgtext+0avgdata 7124maxresident)k
0inputs+494464outputs (0major+943minor)pagefaults 0swaps
Adjusting wavelength calibration by at most 0.0010865545 nm
Fitting Resolution
Optimal resolution found at R =  66871.4524013
Fraction of points used in X^2 evaluation:  1.0
X^2 =  0.226298100511
kgullikson88 commented 7 years ago

@jlamoure it looks like it is working correctly now with gfortran-4.8, unless it stops somewhere after the text that you included here?

jlamoure commented 7 years ago

@kgullikson88 You are absolutely correct. I assumed "STOP LBLRTM EXIT" was an error.