scottransom / presto

Open source pulsar search and analysis toolkit
http://www.cv.nrao.edu/~sransom/presto/
GNU General Public License v2.0
239 stars 177 forks source link

.mask file doesn't save when running rfifind #152

Closed k-perez closed 3 years ago

k-perez commented 3 years ago

Hi, I installed the new v4 PRESTO on my Desktop, and when running the rfifind command within my pipeline, it runs smoothly, and I get the Writing mask data and Amount Complete = 100% output. However, the .mask file does not get saved. Other files such as .rfi and .stats do get saved as expected. Any idea on what could be happening? There are no errors reported. Thanks!

scottransom commented 3 years ago

Hmmm. That's strange. I'm definitely not seeing that. Does this happen with the file from the PRESTO tutorial? And also, does it take a particularly long amount of time to run rfifind on your data? I have seen bad things happen with not-well-behaved input data before.

k-perez commented 3 years ago

No, it is not taking a long time to run rfifind on my data, just a few minutes. And yes, this also happens with the tutorial file; I just tested it.

scottransom commented 3 years ago

Can you please post the full output of rfifind after you run it on that test tutorial file?

scottransom commented 3 years ago

Also, definitely make sure that there are no other .mask files in the output directory -- especially if they were written by another person so that you have permissions issues!

k-perez commented 3 years ago

Here is the output. Maybe not very informative since the issue is that the .mask file does not exist. It is also not a directory issue; I've checked this too, and the other files (.inf, .stats, .rfi) are all being saved to the same directory.

Reading SIGPROC filterbank data from 1 file:
  /Users/kperez/Documents/WVU_data/GBT_Lband_PSR.fil

    Number of files = 1
       Num of polns = 2 (summed)
  Center freq (MHz) = 1400
    Num of channels = 96
    Sample time (s) = 7.2e-05       
     Spectra/subint = 2400
   Total points (N) = 531000
     Total time (s) = 38.232        
     Clipping sigma = 6.000
   Invert the band? = False
          Byteswap? = False
     Remove zeroDM? = False

File  Start Spec   Samples     Padding        Start MJD
----  ----------  ----------  ----------  --------------------
1              0      531000           0  53010.48482638889254

Analyzing data sections of length 38400 points (2.7648 sec).
  Prime factors are:  2 2 2 2 2 2 2 2 2 3 5 5 

Writing mask data  to GBT_Lband_PS_rfifind.mask.
Writing  RFI data  to GBT_Lband_PS_rfifind.rfi.
Writing statistics to GBT_Lband_PS_rfifind.stats.

Massaging the data ...

Amount Complete = 100%
mask_file /Users/kperez/Documents/WVU_data/GBT_Lband_PS_rfifind.mask
base_name GBT_Lband_PS
mask_file /Users/kperez/Documents/WVU_data/GBT_Lband_PS_rfifind.mask
time 2.0
nchans 96
tsamp 0.072
chanfrac 0.5
og_fil_file [/Users/kperez/Documents/WVU_data/GBT_Lband_PSR.fil]
Traceback (most recent call last):
  File "/Users/kperez/Documents/WVU_data/Pulsar_pipelineJ1302.py", line 350, in <module>
    rfi_filter(inpath, time_s, timesig, freqsig, chanfrac, intfrac, max_percent, MASKFILE, sp, outpath)
  File "/Users/kperez/Documents/WVU_data/Pulsar_pipelineJ1302.py", line 228, in rfi_filter
    percentage_flagged, percentage_bad_ints = rfi_quality_check.rfi_check(base_name, mask_file, time, nchans, tsamp, chanfrac)
  File "/Users/kperez/Documents/WVU_data/rfi_quality_check.py", line 12, in rfi_check
    a = rfifind_bandpass_on.rfifind(mask_file)
  File "/Users/kperez/Documents/WVU_data/rfifind_bandpass_on.py", line 39, in __init__
    self.read_mask()
  File "/Users/kperez/Documents/WVU_data/rfifind_bandpass_on.py", line 58, in read_mask
    x = open(self.basename+".mask")
FileNotFoundError: [Errno 2] No such file or directory: /Users/kperez/Documents/WVU_data/GBT_Lband_PS_rfifind.mask
scottransom commented 3 years ago

That output definitely helps! It looks like someone has modified the rfifind code to output all of that other stuff, starting with the first mask_file line.

The bottom output should look like:

Amount Complete = 100%
There are 20 RFI instances.

Total number of intervals in the data:  3552

  Number of padded intervals:       96  ( 2.703%)
  Number of  good  intervals:     3327  (93.666%)
  Number of  bad   intervals:      129  ( 3.632%)

  Ten most significant birdies:
#  Sigma     Period(ms)      Freq(Hz)       Number 
----------------------------------------------------
1  6.67      11.5844         86.3233        66      
2  6.53      11.4564         87.2878        59      
3  6.52      11.52           86.8055        66      
4  5.73      8.86154         112.847        1       
5  5.44      11.6494         85.841         26      
6  5.40      11.7818         84.8765        26      
7  5.39      11.7153         85.3588        26      
8  5.26      8.82383         113.329        1       
9  5.03      8.74937         114.294        2       
10 5.02      8.78644         113.812        2       

  Ten most numerous birdies:
#  Number    Period(ms)      Freq(Hz)       Sigma 
----------------------------------------------------
1  230       34.56           28.9352        4.69    
2  131       17.28           57.8704        4.71    
3  120       17.4252         57.3881        4.70    
4  66        11.5844         86.3233        6.67    
5  66        11.52           86.8055        6.52    
6  59        11.4564         87.2878        6.53    
7  26        11.6494         85.841         5.44    
8  26        11.7818         84.8765        5.40    
9  26        11.7153         85.3588        5.39    
10 21        8.71261         114.776        4.97    

Done.

So I suspect that the code has been modified so that it accidentally doesn't write the .mask file any longer!

You can likely check that with a git diff, unless those changes have been committed (then you would have to git diff with the master branch).

scottransom commented 3 years ago

I'd recommend that you run rfifind directly on that file to get just its output. Maybe something like: rfifind -time 1.0 -o test GBT_Lband_PSR.fil (which is how I ran it above)

k-perez commented 3 years ago

Those other outputs were print statements I added to my pipeline to make sure everything else was working right. Sorry, should've deleted that before posting to avoid confusion. But you're right, there is something else going on that's preventing rfifind from fully running. Running the command above, I get a segmentation fault error.

Assuming the data are SIGPROC filterbank format...
Reading SIGPROC filterbank data from 1 file:
  'GBT_Lband_PSR.fil'

    Number of files = 1
       Num of polns = 2 (summed)
  Center freq (MHz) = 1400
    Num of channels = 96
    Sample time (s) = 7.2e-05       
     Spectra/subint = 2400
   Total points (N) = 531000
     Total time (s) = 38.232        
     Clipping sigma = 6.000
   Invert the band? = False
          Byteswap? = False
     Remove zeroDM? = False

File  Start Spec   Samples     Padding        Start MJD
----  ----------  ----------  ----------  --------------------
1              0      531000           0  53010.48482638889254

Analyzing data sections of length 14400 points (1.0368 sec).
  Prime factors are:  2 2 2 2 2 2 3 3 5 5 

Writing mask data  to 'test_rfifind.mask'.
Writing  RFI data  to 'test_rfifind.rfi'.
Writing statistics to 'test_rfifind.stats'.

Massaging the data ...

Amount Complete = 100%
zsh: segmentation fault  rfifind -time 1.0 -o test GBT_Lband_PSR.fil
scottransom commented 3 years ago

Interesting. I see where the issue might be. What C compiler are you using? And also, on what type of machine are you running?

If I send you a diff, are you able to apply it and re-compile to test?

k-perez commented 3 years ago

I am using gcc version 8.5.0 (MacPorts gcc8 8.5.0_0) on macOS Catalina. All the dependencies were installed using macports as well. And yes, I should be able to do that.

scottransom commented 3 years ago

OK. I actually made a new PRESTO branch. You can either view the commit that I just made and make it to your rfifind.c file, or switch to the new branch and re-compile.

https://github.com/scottransom/presto/tree/rfifind_fix

scottransom commented 3 years ago

Aargh! I accidentally pushed it up to the master branch. So just go ahead and try it there. :-)

k-perez commented 3 years ago

I just edited my rfifind.c file and re-compiled, and am still getting the same error

scottransom commented 3 years ago

Hmmm. OK. Can you please run the command in gdb? You should just be able to do gdb rfifind then run -time 1.0 -o test GBT_Lband_PSR.fil and then when the segfault happens, just do a where, and send me everything it says?

k-perez commented 3 years ago

I ended up using lldb instead, and this is the output I get.

Writing mask data  to 'test_rfifind.mask'.
Writing  RFI data  to 'test_rfifind.rfi'.
Writing statistics to 'test_rfifind.stats'.

Massaging the data ...

Amount Complete = 100%
Process 42801 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x7fff00050af2)
    frame #0: 0x0000000100a29740 libgfortran.5.dylib`_gfortran_string_len_trim + 35
libgfortran.5.dylib`_gfortran_string_len_trim:
->  0x100a29740 <+35>: cmpb   $0x20, (%rsi,%rdx)
    0x100a29744 <+39>: leaq   -0x1(%rdx), %rax
    0x100a29748 <+43>: je     0x100a2974f               ; <+50>
    0x100a2974a <+45>: leaq   0x1(%rdx), %rax
Target 0: (rfifind) stopped.

I'll keep trying to get gdb to work meanwhile, in case the above isn't helpful.

scottransom commented 3 years ago

Yeah, that's not so useful. Can you do the equivalent of "where" from that point so it gives you a funct/line number that it was in?

k-perez commented 3 years ago

Ok, I got gdb to work and this is the output. I did run make makewisdom when installing, with no errors, and have installed twice now to double check.

Writing mask data  to 'test_rfifind.mask'.
Writing  RFI data  to 'test_rfifind.rfi'.
Writing statistics to 'test_rfifind.stats'.

Massaging the data ...

Amount Complete =   0%Warning:  Couldn't open '(null)/lib/fftw_wisdom.txt'
          You should run 'makewisdom'.  See $PRESTO/INSTALL.
Amount Complete = 100%
scottransom commented 3 years ago

It is actually ending after the "Amount Complete = 100%"? And the code hasn't been edited at all?

k-perez commented 3 years ago

It has not been edited, except for the changes to rfifind.c from yesterday

scottransom commented 3 years ago

So bizarre. It is seemingly just stopping without finishing the rest of the program!

Can you run gdb and put a breakpoint on the function write_mask and the run it and see if if gets to there? If so, then step through line by line.

It is really hard for me to debug things when I cannot replicate an error.

k-perez commented 3 years ago

I put a breakpoint at write_mask, rfifind_plot (which is the function right before it), and a few other places to test where it is breaking, and it does not go beyond rfifind_plot, so it is not getting to write_mask. Here is the output. Let me know if I can do anything else to find a more informative bug!

Thread 2 hit Breakpoint 5, 0x000000010003b450 in writeinf ()
(gdb) continue
Continuing.
Analyzing data sections of length 14400 points (1.0368 sec).
  Prime factors are:  2 2 2 2 2 2 3 3 5 5 

Writing mask data  to 'test_rfifind.mask'.
Writing  RFI data  to 'test_rfifind.rfi'.
Writing statistics to 'test_rfifind.stats'.

Massaging the data ...

Amount Complete =   0%Warning:  Couldn't open '(null)/lib/fftw_wisdom.txt'
          You should run 'makewisdom'.  See $PRESTO/INSTALL.
Amount Complete = 100%

Thread 2 hit Breakpoint 7, 0x0000000100018a30 in rfifind_plot ()
(gdb) continue
Continuing.

Thread 2 received signal SIGSEGV, Segmentation fault.
0x0000000100a29740 in ?? ()
scottransom commented 3 years ago

ah-ha! Now we are getting somewhere. After the segfault, can you do a "w" or "where"? to see what line it is on and in what file?

k-perez commented 3 years ago

where doesn't seem to specify any lines, but if I do a list, I get this:

Amount Complete =   0%Warning:  Couldn't open '(null)/lib/fftw_wisdom.txt'
          You should run 'makewisdom'.  See $PRESTO/INSTALL.
Amount Complete = 100%

Thread 2 hit Breakpoint 6, 0x0000000100018a30 in rfifind_plot ()
(gdb) continue
Continuing.

Thread 2 received signal SIGSEGV, Segmentation fault.
0x0000000100a29740 in ?? ()
(gdb) l
44  int compare_rfi_sigma(const void *ca, const void *cb);
45  int compare_rfi_numobs(const void *ca, const void *cb);
46  int read_subband_rawblocks(FILE * infiles[], int numfiles, short *subbanddata,
47                             int numsamples, int *padding);
48  void get_subband(int subbandnum, float chandat[], short srawdata[], int numsamples);
49  extern int *ranges_to_ivect(char *str, int minval, int maxval, int *numvals);
50  
51  /* The main program */
52  
53  int main(int argc, char *argv[])
(gdb) where
#0  0x0000000100a29740 in ?? ()
#1  0x0000000100743446 in ?? ()
#2  0x00007ffeefbfe390 in ?? ()
#3  0x00007ffeefbfe38c in ?? ()
#4  0x00007ffeefbfe394 in ?? ()
#5  0x000000010074d933 in ?? ()
#6  0x2020202020202020 in ?? ()
#7  0x0000000020202020 in ?? ()
#8  0x00007ffeefbfe3c0 in ?? ()
#9  0x000000010074e197 in ?? ()
#10 0x000000010004eb62 in ?? ()
#11 0x0000000000000001 in ?? ()
#12 0x00007ffeefbfe3c0 in ?? ()
#13 0x0000000000000004 in ?? ()
#14 0x00007ffeefbfe388 in ?? ()
#15 0x000000010004eb57 in ?? ()
#16 0x00007ffeefbfe38c in ?? ()
#17 0x00000001007429c5 in ?? ()
#18 0x0000000000000001 in ?? ()
#19 0x0000000000000000 in ?? ()
scottransom commented 3 years ago

Did you change the PRESTO makefile at all? It doesn't seem like there is debugging symbols compiled into rfifind. When you type "make" are you getting "-g" in each of the gcc command lines?

k-perez commented 3 years ago

Nope, I did not change the Makefile, and I am getting a "-g" in each of the gcc commands

scottransom commented 3 years ago

Hmmm. OK. One more question about what you pasted above. So once you get the segmentation fault and do the "where", is that all of the output? Or are there things after #19? Also, if that is all, what happens if you do "up" after the where? Each time you go "up" you should jump up out of the current loop or function to the one above. If you do enough "up"s, we might be able to figure out where we are in the PRESTO code. The reason there is no line numbers might be because we are trapped in a non-PRESTO library.

k-perez commented 3 years ago

Yes, that is all of the output, and when I do "up", it just prints out each individual frame (#0 0x0000000100a29740 in ?? ()), but the line numbers are still empty.

I've been playing with lldb, and it does seem to be a bit more informative. I added two breakpoints at write_mask and rfifind_plot. Here we can see that it might be a libgfortran issue?

Amount Complete =   0%Warning:  Couldn't open '(null)/lib/fftw_wisdom.txt'
          You should run 'makewisdom'.  See $PRESTO/INSTALL.
Amount Complete = 100%
Process 7682 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000100018a30 rfifind`rfifind_plot(numchan=96, numint=37, ptsperint=14400, timesigma=10, freqsigma=4, inttrigfrac=0.300000012, chantrigfrac=0.699999988, dataavg=0x0000000100d04380, datastd=0x0000000100d04580, datapow=0x0000000100d046e0, userchan=0x0000000100c09fe0, numuserchan=0, userints=0x0000000100c09f40, numuserints=0, idata=0x00007ffeefbff200, bytemask=0x0000000100d04820, oldmask=0x00007ffeefbfece0, newmask=0x00007ffeefbfed50, rfivect=0x0000000100d04970, numrfi=20, rfixwin=0, rfips=0, xwin=0) at rfifind_plot.c:78:1
   75                     mask * oldmask, mask * newmask,
   76                     rfi * rfivect, int numrfi, int rfixwin, int rfips, int xwin)
   77   /* Make the beautiful multi-page rfifind plots */
-> 78   {
   79       int ii, jj, ct, loops = 1;
   80       float *freqs, *chans, *times, *ints;
   81       float *avg_chan_avg, *std_chan_avg, *pow_chan_avg;
Target 0: (rfifind) stopped.
(lldb) thread continue
Resuming thread 0x5e2ff in process 7682
Process 7682 resuming
Process 7682 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x7fff00050af2)
    frame #0: 0x0000000100a29740 libgfortran.5.dylib`_gfortran_string_len_trim + 35
libgfortran.5.dylib`_gfortran_string_len_trim:
->  0x100a29740 <+35>: cmpb   $0x20, (%rsi,%rdx)
    0x100a29744 <+39>: leaq   -0x1(%rdx), %rax
    0x100a29748 <+43>: je     0x100a2974f               ; <+50>
    0x100a2974a <+45>: leaq   0x1(%rdx), %rax
Target 0: (rfifind) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x7fff00050af2)
  * frame #0: 0x0000000100a29740 libgfortran.5.dylib`_gfortran_string_len_trim + 35
    frame #1: 0x0000000100743446 libpgplot5.dylib`grtrim_ + 24
    frame #2: 0x000000010074d933 libpgplot5.dylib`pgmtxt_ + 83
    frame #3: 0x0000000100729316 libcpgplot5.dylib`cpgmtxt + 96
    frame #4: 0x000000010001aa1d rfifind`rfifind_plot(numchan=96, numint=37, ptsperint=14400, timesigma=10, freqsigma=<unavailable>, inttrigfrac=<unavailable>, chantrigfrac=<unavailable>, dataavg=0x0000000100d04380, datastd=0x0000000100d04580, datapow=0x0000000100d046e0, userchan=0x0000000100c09fe0, numuserchan=1, userints=0x0000000100c09f40, numuserints=0, idata=0x00007ffeefbff200, bytemask=0x0000000100d04820, oldmask=0x00007ffeefbfece0, newmask=0x00007ffeefbfed50, rfivect=0x0000000100d04970, numrfi=20, rfixwin=0, rfips=0, xwin=0) at rfifind_plot.c:450:9
    frame #5: 0x000000010004aa0d rfifind`main(argc=<unavailable>, argv=<unavailable>) at rfifind.c:468:9
    frame #6: 0x00007fff739d1cc9 libdyld.dylib`start + 1
    frame #7: 0x00007fff739d1cc9 libdyld.dylib`start + 1
scottransom commented 3 years ago

That's good progress! Thanks! It looks like something bad is happening in a PGPLOT cpgmtxt call. So here is something to try: can you try running with "-xwin" and see if you get a plot on the screen? And if so, please take a screenshot of it. I want to carefully check to see if all the labels and text look OK.

paulray commented 3 years ago

cpgmtxt is known to cause segfaults with certain gcc versions: https://trac.macports.org/ticket/57726

k-perez commented 3 years ago

I do not get a plot on the screen, but I am using gcc8 so maybe that is the issue? Let me try re-installing with gcc7

scottransom commented 3 years ago

good grief. That certainly seems like the issue!

k-perez commented 3 years ago

Yes, that was the issue! Thanks so much for helping! For the record, I had pgplot for gcc11. Installing pgplot for gcc7 fixed it.

scottransom commented 3 years ago

Great! Glad you are set now! It is pretty sad that PGPLOT is unmaintained but still so useful!