Closed texadactyl closed 2 years ago
On the brighter side, seticore did succeed in reproducing turbo_seti top hit results for an FRB file. Directory: /datax/scratch/texadactyl/gbt_frb on blpc0 File: blc13_guppi_57991_49836_DIAG_FRB121102_0010.rawspec.0000.h5 (593 MB)
--- File Info ---
DIMENSION_LABELS : [b'time' b'feed_id' b'frequency']
az_start : 0.0
data_type : 1
fch1 : 7438.96484375 MHz
foff : -2.7939677238464355e-06 MHz
ibeam : -1
machine_id : 20
nbeams : 1
nbits : 32
nchans : 67108864
nfpc : 1048576
nifs : 1
rawdatafile : blc13_guppi_57991_49836_DIAG_FRB121102_0010.0000.raw
source_name : DIAG_FRB121102
src_dej : 33:08:52.44
src_raj : 5:31:58.632
telescope_id : 6
tsamp : 18.253611007999982
tstart (ISOT) : 2017-08-26T13:50:36.000
tstart (MJD) : 57991.57680555555
za_start : 0.0
Num ints in file : 3
File shape : (3, 1, 67108864)
--- Selection Info ---
Data selection shape : (3, 1, 67108864)
Minimum freq (MHz) : 7251.464846543968
Maximum freq (MHz) : 7438.96484375
Interesting. I tweaked the parameters a bit and discovered that this is coming from a difference in drift rate calculation. Seticore thinks these hits have a drift rate of zero, where turboseti thinks these hits have a nonzero drift rate, so turboseti shows them while seticore filters them out. When I set snr=5 and min_drift=0 you see that they find the same stuff:
lacker@freedom:/d/tex$ ./run_ts.sh
<snipped junk>
find_doppler.0 INFO Top hit found! SNR 126.461868, Drift Rate -0.102043, index 166373
find_doppler.0 INFO Top hit found! SNR 139.640518, Drift Rate -0.102043, index 882201
find_doppler.30 INFO Top hit found! SNR 7.410137, Drift Rate -0.051021, index 384478
find_doppler.39 INFO Top hit found! SNR 8.666421, Drift Rate 0.102043, index 749826
find_doppler.60 INFO Top hit found! SNR 9.245283, Drift Rate -0.102043, index 417684
find_doppler.60 INFO Top hit found! SNR 5.121273, Drift Rate 0.306128, index 418475
find_doppler.60 INFO Top hit found! SNR 11.463472, Drift Rate -0.204085, index 441634
Search time: 0.57 min
lacker@freedom:/d/tex$ ./run_sc.sh
<snipped junk>
hit: coarse channel = 0, index = 166374, snr = 186.89049, drift rate = -0.00000 (0 bins)
hit: coarse channel = 0, index = 882202, snr = 210.63504, drift rate = -0.00000 (0 bins)
hit: coarse channel = 30, index = 384478, snr = 9.82641, drift rate = -0.00000 (0 bins)
hit: coarse channel = 39, index = 749826, snr = 8.67138, drift rate = 0.10204 (-2 bins)
hit: coarse channel = 60, index = 417685, snr = 10.49819, drift rate = -0.00000 (0 bins)
hit: coarse channel = 60, index = 418475, snr = 5.12319, drift rate = 0.30613 (-6 bins)
hit: coarse channel = 60, index = 441634, snr = 11.46777, drift rate = -0.20409 (4 bins)
dedoppler elapsed time 5s
lacker@freedom:/d/tex$
So the root issue is that either turboseti or seticore is calculating drift rates wrong here. It might be a weird edge case since there are 3 timesteps and some of the code assumes powers of 2.
Here's what the area around the two hits in question looks like:
I think seticore is doing the right thing here. These really are hits with zero drift. Maybe turboseti is throwing out hits with zero drift before it deduplicates, and seticore is throwing out hits with zero drift after it deduplicates?
turbo_seti filters out by min/max drift rate before calling hitsearch in the main loop of a given coarse channel. At the end of the main loop, when tophitsearch is called, the max snr values are selected.
So, in effect, turbo_seti throws out outlier drift rates before any further dedoppler analysis. Not defending the apparent design, just explaining it. Who knows what the original turbo_seti developer had in mind?
It is odd that seticore treated a min_drift_rate of 0.0010 as zero.
It doesn't treat 0.001 as zero - that's why in the original run, with min_drift=0.001, seticore filters out the zero-drift hits. The output I pasted in https://github.com/lacker/seticore/issues/16#issuecomment-1192029675 is from a run with min_drift=0 so that it shows the nondrifting hits as well.
I see your point. I must have been tired when I wrote the last comment.
"either turboseti or seticore is calculating drift rates wrong" So, my obvious question is which methodology is correct? turbo_seti or seticore?
All right, I'm going to call this "working as intended". There's just a slight difference in turboseti and seticore behavior here, if this slight difference becomes an issue we can reopen the debate
python3 run_ts.py
turbo_seti output .dat file:
bash run_sc.sh
seticore output .dat file:
watutil -i blc17_guppi_57991_49318_DIAG_PSR_J0332+5434_0008.rawspec.0000.h5