seticore dedoppler misses hits in a GBT Pulsar file

texadactyl commented 2 years ago

Compute node: blpc0
Directory: /datax/scratch/texadactyl/gbt_pulsar
GBT Pulsar file: blc17_guppi_57991_49318_DIAG_PSR_J0332+5434_0008.rawspec.0000.h5 (578 MB)

python3 run_ts.py

find_doppler.0  INFO     Spectra 0 1st 3 values: [800914.25 830376.5  932874.25]
find_doppler.0  INFO     Spectra 1 1st 3 values: [808807.4375 924503.25   908501.8125]
find_doppler.0  INFO     Top hit found! SNR 126.461868, Drift Rate -0.102043, index 166373
find_doppler.0  INFO     Top hit found! SNR 139.640518, Drift Rate -0.102043, index 882201
turbo_seti FindDoppler.search E.T. is 80.4 seconds

turbo_seti output .dat file:

# -------------------------- o --------------------------
# File ID: blc17_guppi_57991_49318_DIAG_PSR_J0332+5434_0008.rawspec.0000.h5 
# -------------------------- o --------------------------
# Source:DIAG_PSR_J0332+5434
# MJD: 57991.570810185185   RA: 3h32m59.184s    DEC: 54d34m43.68s
# DELTAT:  18.253611    DELTAF(Hz):  -2.793968  max_drift_rate:  10.000000  obs_length:  54.760833
# --------------------------
# Top_Hit_#     Drift_Rate  SNR     Uncorrected_Frequency   Corrected_Frequency     Index   freq_start  freq_end    SEFD    SEFD_freq   Coarse_Channel_Number   Full_number_of_hits     
# --------------------------
000001   -0.102043  126.461868     6688.500003     6688.500003  166373     6688.501098     6688.498911  0.0       0.000000  0   2332    
000002   -0.102043  139.640518     6686.500003     6686.500003  882201     6686.501098     6686.498910  0.0       0.000000  0   2332

bash run_sc.sh

welcome to seticore, version 0.1.5
running in dedoppler mode.
loading input from ./blc17_guppi_57991_49318_DIAG_PSR_J0332+5434_0008.rawspec.0000.h5
dedoppler parameters: max_drift=10.00 min_drift=0.0010 snr=25.00
writing output to seticore.dat
dedoppler elapsed time 6s

seticore output .dat file:

# -------------------------- o --------------------------
# File ID: seticore.dat 
# -------------------------- o --------------------------
# Source:DIAG_PSR_J0332+5434
# MJD: 57991.570810185185   RA: 3.549773    DEC: 54.578800
# DELTAT:  18.253611    DELTAF(Hz):  -2.793968  max_drift_rate:  10.000000  obs_length:  54.760833
# --------------------------
# Top_Hit_#     Drift_Rate  SNR     Uncorrected_Frequency   Corrected_Frequency     Index   freq_start  freq_end    SEFD    SEFD_freq   Coarse_Channel_Number   Full_number_of_hits 
(none)

watutil -i blc17_guppi_57991_49318_DIAG_PSR_J0332+5434_0008.rawspec.0000.h5

--- File Info ---
DIMENSION_LABELS : [b'time' b'feed_id' b'frequency']
        az_start :                              0.0
       data_type :                                1
            fch1 :                6688.96484375 MHz
            foff :      -2.7939677238464355e-06 MHz
           ibeam :                               -1
      machine_id :                               20
          nbeams :                                1
           nbits :                               32
          nchans :                         67108864
            nfpc :                          1048576
            nifs :                                1
     rawdatafile : blc17_guppi_57991_49318_DIAG_PSR_J0332+5434_0008.0000.raw
     source_name :              DIAG_PSR_J0332+5434
         src_dej :                      54:34:43.68
         src_raj :                      3:32:59.184
    telescope_id :                                6
           tsamp :               18.253611007999982
   tstart (ISOT) :          2017-08-26T13:41:58.000
    tstart (MJD) :               57991.570810185185
        za_start :                              0.0

Num ints in file :                                3
      File shape :                 (3, 1, 67108864)
--- Selection Info ---
Data selection shape :                 (3, 1, 67108864)
Minimum freq (MHz) :                6501.464846543968
Maximum freq (MHz) :                    6688.96484375

texadactyl commented 2 years ago

On the brighter side, seticore did succeed in reproducing turbo_seti top hit results for an FRB file. Directory: /datax/scratch/texadactyl/gbt_frb on blpc0 File: blc13_guppi_57991_49836_DIAG_FRB121102_0010.rawspec.0000.h5 (593 MB)

--- File Info ---
DIMENSION_LABELS : [b'time' b'feed_id' b'frequency']
        az_start :                              0.0
       data_type :                                1
            fch1 :                7438.96484375 MHz
            foff :      -2.7939677238464355e-06 MHz
           ibeam :                               -1
      machine_id :                               20
          nbeams :                                1
           nbits :                               32
          nchans :                         67108864
            nfpc :                          1048576
            nifs :                                1
     rawdatafile : blc13_guppi_57991_49836_DIAG_FRB121102_0010.0000.raw
     source_name :                   DIAG_FRB121102
         src_dej :                      33:08:52.44
         src_raj :                      5:31:58.632
    telescope_id :                                6
           tsamp :               18.253611007999982
   tstart (ISOT) :          2017-08-26T13:50:36.000
    tstart (MJD) :                57991.57680555555
        za_start :                              0.0

Num ints in file :                                3
      File shape :                 (3, 1, 67108864)
--- Selection Info ---
Data selection shape :                 (3, 1, 67108864)
Minimum freq (MHz) :                7251.464846543968
Maximum freq (MHz) :                    7438.96484375

lacker commented 2 years ago

Interesting. I tweaked the parameters a bit and discovered that this is coming from a difference in drift rate calculation. Seticore thinks these hits have a drift rate of zero, where turboseti thinks these hits have a nonzero drift rate, so turboseti shows them while seticore filters them out. When I set snr=5 and min_drift=0 you see that they find the same stuff:

lacker@freedom:/d/tex$ ./run_ts.sh 
<snipped junk>
find_doppler.0  INFO     Top hit found! SNR 126.461868, Drift Rate -0.102043, index 166373
find_doppler.0  INFO     Top hit found! SNR 139.640518, Drift Rate -0.102043, index 882201
find_doppler.30 INFO     Top hit found! SNR 7.410137, Drift Rate -0.051021, index 384478
find_doppler.39 INFO     Top hit found! SNR 8.666421, Drift Rate 0.102043, index 749826
find_doppler.60 INFO     Top hit found! SNR 9.245283, Drift Rate -0.102043, index 417684
find_doppler.60 INFO     Top hit found! SNR 5.121273, Drift Rate 0.306128, index 418475
find_doppler.60 INFO     Top hit found! SNR 11.463472, Drift Rate -0.204085, index 441634
Search time:  0.57 min
lacker@freedom:/d/tex$ ./run_sc.sh 
<snipped junk>
hit: coarse channel = 0, index = 166374, snr = 186.89049, drift rate = -0.00000 (0 bins)
hit: coarse channel = 0, index = 882202, snr = 210.63504, drift rate = -0.00000 (0 bins)
hit: coarse channel = 30, index = 384478, snr = 9.82641, drift rate = -0.00000 (0 bins)
hit: coarse channel = 39, index = 749826, snr = 8.67138, drift rate = 0.10204 (-2 bins)
hit: coarse channel = 60, index = 417685, snr = 10.49819, drift rate = -0.00000 (0 bins)
hit: coarse channel = 60, index = 418475, snr = 5.12319, drift rate = 0.30613 (-6 bins)
hit: coarse channel = 60, index = 441634, snr = 11.46777, drift rate = -0.20409 (4 bins)
dedoppler elapsed time 5s
lacker@freedom:/d/tex$

So the root issue is that either turboseti or seticore is calculating drift rates wrong here. It might be a weird edge case since there are 3 timesteps and some of the code assumes powers of 2.

lacker commented 2 years ago

Here's what the area around the two hits in question looks like:

Screenshot from 2022-07-21 16-41-04

I think seticore is doing the right thing here. These really are hits with zero drift. Maybe turboseti is throwing out hits with zero drift before it deduplicates, and seticore is throwing out hits with zero drift after it deduplicates?

texadactyl commented 2 years ago

turbo_seti filters out by min/max drift rate before calling hitsearch in the main loop of a given coarse channel. At the end of the main loop, when tophitsearch is called, the max snr values are selected.

So, in effect, turbo_seti throws out outlier drift rates before any further dedoppler analysis. Not defending the apparent design, just explaining it. Who knows what the original turbo_seti developer had in mind?

texadactyl commented 2 years ago

It is odd that seticore treated a min_drift_rate of 0.0010 as zero.

lacker commented 2 years ago

It doesn't treat 0.001 as zero - that's why in the original run, with min_drift=0.001, seticore filters out the zero-drift hits. The output I pasted in https://github.com/lacker/seticore/issues/16#issuecomment-1192029675 is from a run with min_drift=0 so that it shows the nondrifting hits as well.

texadactyl commented 2 years ago

I see your point. I must have been tired when I wrote the last comment.

"either turboseti or seticore is calculating drift rates wrong" So, my obvious question is which methodology is correct? turbo_seti or seticore?

lacker commented 2 years ago

All right, I'm going to call this "working as intended". There's just a slight difference in turboseti and seticore behavior here, if this slight difference becomes an issue we can reopen the debate

lacker / seticore

seticore dedoppler misses hits in a GBT Pulsar file #16