njoy / NJOY2016

Nuclear data processing with legacy NJOY
https://www.njoy21.io/NJOY2016
Other
96 stars 86 forks source link

Segmentation fault in ACER while processing Photoatomic data #135

Closed jlconlin closed 4 years ago

jlconlin commented 4 years ago

A segmentation fault is encountered when running this input:

acer / Prepare ACE files
20 -26 0 27 28           / Card 1
4 0 1 .31                 / Card 2
'Th from ENDF NJOY 2016'  / Card 3
9000                      / Card 11
stop 

The input tape20 (from ENDF/B-VIII.0) is attached.

jlconlin commented 4 years ago

This may be related to #91.

jlconlin commented 4 years ago

With debugging turned on, the error looks like this:

 njoy 2016.20  30Jul17                                       10/28/19 09:58:34
 *****************************************************************************

 acer...                                                                  0.5s

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x7F46A055BE08
#1  0x7F46A055AF90
#2  0x7F46A01A54AF
#3  0x7F46A092C228
#4  0x7F46A09509E6
#5  0x401B10 in MAIN__ at main.f90:?

A little bit of investigation shows that it gets stuck in acepa.f90:104-117:

   enext=emin
   do while (enext.lt.emax)
    write(nsyso,'(/'' enext: '',2e20.10,'' emax: '',2e20.10)') enext, emax
      e=sigfig(enext,7,0)
      if (idis.ne.0) then
         e=sigfig(e,7,-1)
         call gety1(e,enext,idis,s,nin,scr)
         l=l+1
         xss(l)=e
         e=sigfig(e,7,+2)
      endif
      call gety1(e,enext,idis,s,nin,scr)
      l=l+1
      xss(l)=e
   enddo

I don't see a way to get out of this for loop.

whaeck commented 4 years ago

This is probably related to #91

kahlerac commented 4 years ago

Agreed you don't get out of the "do while" loop. "emax" is set in acepho to 1.01e11 but "enext" can never exceed 1.0000e+11 (second item on the third line of the endf input file, tape20). A quick and dirty fix is to reset the acepho emax parameter variable to 9.999e10 but a more rigorous fix is to make sure "emax" isn't ever defaulted to something larger than the maximum possible "enext". Skip

On Mon, Oct 28, 2019 at 2:37 PM Wim Haeck notifications@github.com wrote:

This is probably related to #91 https://github.com/njoy/NJOY2016/issues/91

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/njoy/NJOY2016/issues/135?email_source=notifications&email_token=AEHJISNWMPQRDHICXYQSLILQQ4WNFA5CNFSM4JF6D2O2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECN6LMA#issuecomment-547087792, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEHJISPJHC2E37GLJYT6VSLQQ4WNFANCNFSM4JF6D2OQ .

-- Dr. A. C. (Skip) Kahler Kahler Nuclear Data Services, LLC kahler3ac@gmail.com +1 321 368 3645

whaeck commented 4 years ago

@danrouzen reported this on October 8, 2019 by email:

Hi all:

to sum up my recent experience:

B-VII.1 photo-atomic : NJOY2016 converts endf files to ace files (p files) for all nuclides I tested. Cross sections and other data can be found and read from NJOY output file (“output”); acer visualizer does not work but one can leave without it. (and it seems B-VII.1 files are in strict endf format).

B-VIII.0 photo-atomic : acer of NJOY2016 crashed from the beginning, somewhere in acepa.f90 :

FORMALLY, the code complains about nxss=999000 (that should be a sufficiently large number for reading/storing data)

! main container array for ace data real(kr),dimension(:),allocatable::xss integer,parameter::nxss=999000

“Fortran runtime error: Array element out of bounds: 999001 in (1:999000), dim=1” , from the do loop after “ !--locate and store energy grid of total cross section ”
[ code line 116 in the version I used ]

In the examples I checked, the data block for total cross sections satisfy dim(E_i) < 999000 , and all other data blocks are within this limit, i.e., code reads who knows what from the endf file and comes to its limit.

(btw, NJOY99 gives this message: Fortran runtime error: Array element out of bounds: 8000001 in (1:8000000), dim=1 )

Is any format checker (and possibly a converter, ~ “back to the strict endf”) for B-VIII.0 photo-atomic data files IF all this is just a format issue ?

otherwise, we have this: everybody can take the first nuclide (1-H) and NJOY2016, and
1-H from B-VII.1 is processable to ace (by acer), and all the data blocks are read correctly (as far as I can judge),
BUT 1-H from B-VIII.0 is NOT processable at the moment.

Cheers, Dan

P.S. concerning photo-nuclear data processing with NJ2016, from the endf files to ace files (.u files): it looks OK (but I checked only H-2 and Be-9 from B-VIII.0 at the moment). DR.

whaeck commented 4 years ago

@kahlerac As a test, I just changed the ENDF upper energy of every MF23 entry to 1.1e+11 (from 1e+11). If I do not use atomic relaxation data (the second input tape set to 0), then the file processes properly. However, when I add the atomic relaxation data, I get another problem where I apparently read over the end of the relaxation data ENDF file.

I'll propose a fix for this issue before I try and figure out the other one.

whaeck commented 4 years ago

Fun fact by the way: ENDF/B-VII.1 actually sets the maximum energy in MF1 MT451 to 1e+8 instead of 1e+11 (which is the value used in the MF23 sections).

whaeck commented 4 years ago

OK, I was wrong. The issue is not the upper energy limit, it's sigfig. For reference, this is the snippet given by @jlconlin:

   enext=emin
   do while (enext.lt.emax)
      e=sigfig(enext,7,0)
      if (idis.ne.0) then
         e=sigfig(e,7,-1)
         call gety1(e,enext,idis,s,nin,scr)
         l=l+1
         xss(l)=e
         e=sigfig(e,7,+2)
      endif
      call gety1(e,enext,idis,s,nin,scr)
      l=l+1
      xss(l)=e
   enddo

Let's talk about gety1() first: if we are at the last point, gety1() does the following:

   !--special branch for last point
   300 continue
   y1=a(ln+2)
   xlast=xbig
   xnext=xbig
   return
   end subroutine gety1

where xbig is 1e+12 so xnext (or enext in the code snippet given above) is larger than 1e+11 so the loop will exit properly. That's why ENDF/B-VII.1 passes without an issue (in my previous comment, the change to 1.1e+11 does nothing to change this since I used the ENDF/B-VII.1 file and not the ENDF/B-VIII.0 file).

So, where is the problem exactly? The minimum energy where NJOY begins it's treatment for MF23 MT501 is 1000 eV. The call to sigfig then sets xnext to the next value, which in ENDF/B-VIII.0 is 1000.00023 eV. Inside the do loop, the next energy will be truncated by sigfig(enext, 7, 0). As a result, 1000.00023 eV gets transformed into ... (wait for it) ... 1000 eV. Calling gety1() again will set enext to 1000.00023 eV again, effectively initiating the infinite loop.

Replacing the sigfig call by just setting e=enext or using 9 digits instead of7 as in e=sigfig(enext,9,...) solves the problem.

All of this without the atomic relaxation data of course, which I will try to look into now.

whaeck commented 4 years ago

Now that this infinite loop is solved, the next problem rears it's head: we go over the array size somewhere when using ENDF/B-VIII.0. There is no issue with ENDF/B-VII.1. This happens still without using the relaxation data.

The issue was traced back to reading the MF27 MT502 data (coherent form factors). because of the length of the scr array was only initialised to 1000 words, we went over the size limit (we needed a little less than 2000). ENDF/B-VII.1 posed no problem because it fit in the array. Setting the nwscr value to 10000 fixes the problem.

To avoid this issue in the future, I propose changing the following code:

   call tab1io(nin,0,0,scr,nb,nw)
   l=1
   do while (nb.ne.0)
      l=l+nw
      call moreio(nin,0,0,scr(l),nb,nw)
   enddo

into this:

   call tab1io(nin,0,0,scr,nb,nw)
   l=1+nw
   do while (nb.ne.0)
      if (l.gt.nwscr) call error('acepho',&
              'storage exceeded for the coherent form factors',' ')
      call moreio(nin,0,0,scr(l),nb,nw)
      l=l+nw
   enddo

The same thing for the incoherent scattering function where a similar construct is used. The logic in the above code is changed somewhat because this is the way it is done in other pieces of NJOY (such as GROUPR, RECONR, etc.).

With these changes, NJOY2016 can process the photoatomic data from ENDF/B-VII.1 and ENDF/B-VIII.0 - but still only without the relaxation data. See the branch fix/photoatomic for these changes.

whaeck commented 4 years ago

The remaining issue is actually the one referenced in issue #91: both ENDF/B-VII.1 and ENDF/B-VIII.0 dont process because they read over the end of the relaxation data tape. I'll look at this tomorrow.

whaeck commented 4 years ago

The last issue was also due to the array size not being large enough. I set it to 50000 and added tests in the alax subroutine to detect this in the future:

   do iss=1,nss
      loc(iss)=ll
      if (ll.gt.nwscr) call error('alax',&
              'storage exceeded for the atomic relaxation data',' ')
      call listio(nlax,0,0,a(ll),nb,nw)
      ntr=n2h
      ll=ll+nw
      do while (nb.ne.0)
         if (ll.gt.nwscr) call error('alax',&
                 'storage exceeded for the atomic relaxation data',' ')
         call moreio(nlax,0,0,a(ll),nb,nw)
         ll=ll+nw
      enddo
   enddo

Both ENDF/B-VII.1 and ENDF/B-VIII.0 for Th now process.