Closed titodalcanton closed 7 years ago
I can get pycbc_single_template to work with the following patch, which takes into account the padding in selecting the right data segment:
@@ -35,7 +35,7 @@ def subtract_template(stilde, template, snr, trigger_time, flow):
stilde -= pycbc.waveform.utils.apply_fseries_time_shift(inverse, dt)
return stilde
-def select_segments(fname, anal_name, data_name, ifo, time):
+def select_segments(fname, anal_name, data_name, ifo, time, pad_data):
anal_segs = events.select_segments_by_definer(fname, anal_name, ifo)
data_segs = events.select_segments_by_definer(fname, data_name, ifo)
@@ -69,7 +69,7 @@ def select_segments(fname, anal_name, data_name, ifo, time):
# largest overlap with anal_time
overlap = None
for start, end in zip(s2, e2):
- if start > anal_time[0] or end < anal_time[1]:
+ if start + pad_data > anal_time[0] or end - pad_data < anal_time[1]:
# The analysis time must lie within the data time, otherwise
# this clearly is not the right data segment!
continue
@@ -181,7 +181,7 @@ if opt.inspiral_segments:
# trig start/end times
anal_seg, data_seg = select_segments(opt.inspiral_segments,
opt.data_analyzed_name, opt.data_read_name,
- ifo, opt.trigger_time)
+ ifo, opt.trigger_time, opt.pad_data)
opt.trig_start_time = anal_seg[0]
opt.trig_end_time = anal_seg[1]
opt.gps_start_time = data_seg[0] + opt.pad_data
Ian, is this the correct approach or is there another way this error should be addressed?
Thanks @titodalcanton. I thought I had tested edge cases here (and even edited the segment module to ensure that all segments get parsed here!), but it seems I've missed something.
What you say makes sense, but I need some more time to look over and check this is the correct issue. Not sure I'll get that done today though as I'm flying back to Europe later.
As you seem to have a solution can you move this to a pull request?
Ok, one more data point. There are the two inspiral jobs producing the almost-identical data segments, and they have the following params:
--gps-start-time 1163059111
--gps-end-time 1163059831
--trig-start-time 1163059111
--trig-end-time 1163059471
and
--gps-start-time 1163059111
--gps-end-time 1163059832
--trig-start-time 1163059471
--trig-end-time 1163059832
These look sane to me, so it seems the issue is purely in how select_segments() is deciding how to handle the tricksy case, and I'm going to make a PR with my patch.
@titodalcanton please tag it as 1.6.1 milestone and I can get a new release made.
Wait. These inputs make no sense. You say we have two data read entries : (1163059103.0, 1163059839.0) and (1163059103.0, 1163059840.0). Why are they like this? If the second job is reading all necessary data, there shouldn't be two jobs here at all.
I tried to comment out my reasons for the choices being made here:
https://github.com/ligo-cbc/pycbc/blob/master/bin/pycbc_single_template#L38
Not really sure adding pad-data here is needed. Data read in should already include padding (regardless of what arguments actually go to pycbc_inspiral) and data analysed should be accurate.
So I do not think padding is the issue.
So is the issue instead that we have two inspiral jobs where one would be sufficient? That was my other guess...
If the lock goes between 1163059103.0 and 1163059840.0 (737 seconds) this should have been segmented as two inspiral jobs. However, each job should have read in 512 + (pad-data) * 2 seconds. What are the DATA_ANALYSED segments for this case, and can you confirm that this is the span of the lock?
The change you made does make sense though, and I'll approve it, but the existing code should have worked.
From the daily page https://ldas-jobs.ligo-wa.caltech.edu/~detchar/summary/day/20161113/ I see that indeed there is such a segment of H1:DMT-ANALYSIS_READY:1,
1 1163059103.0 1163059840.0 737.0
From Tom's H1L1-INSP_SEGMENTS-1161961217-1242000.xml I see
segment_definer:segment_def_id:2 1163059103 1163059839
segment_definer:segment_def_id:2 1163059103 1163059840
where seg def 2 corresponds to H1's DATA_ANALYSED.
Thanks for the zeal github, but reopening as this needs further investigation.
I've been following this up now, and there is an underlying problem here. The symptom is that PyCBC is only analysing segments longer than 720s and not 520s as expected. The issue is in the functions determining valid_time for the pycbc_inspiral executable. Need to explore this a bit more before proposing a solution. This wasn't noticed in testing because previously we used a 256s segment length. The fix in #1314 was still needed, that edge case was always possible (it happened if the lock length was 1s longer than the minimum length).
@spxiwh I will go ahead and tag 1.6.1 with @titodalcanton's fix and then we can release 1.6.2 once you have fixed the underlying issue.
@duncan-brown Okay, but we should fix this before starting O2 analyses (and be aware of that!). I'll try and get a solution today or tomorrow.
@spxiwh noted: https://github.com/ligo-cbc/pycbc/releases/tag/v1.6.1
Tom's run with the latest hyperbank [1] failed on pycbc_single_template followups for some IMBH injections. Here's one of the failing commands:
And the error:
The inspiral segments file contains two DATA_ANALYSED segments compatible with the --trigger-time, differing only by one second at the end: (1163059103.0, 1163059839.0) and (1163059103.0, 1163059840.0). In pycbc_single_template, select_segments() chooses the shorter one, which then leads to the error when trying to segment the data.
I'm not sure whether this is a bug in the inspiral segments or in select_segments() (or neither), but I'm opening this to document what I found.
[1] https://www.atlas.aei.uni-hannover.de/~tdent/LSC/er10/tuning/preh1_vlfc_thorne_0p6fp/5._injections/5.14_followup_of_missed_IMBHB_INJ/