PyCBC Live: inconsistent template parameters in different detectors

titodalcanton commented 4 years ago

During a replay analysis running with version 1.15.5, PyCBC Live uploaded a candidate with different template parameters in different detectors. Some of the template parameters reported in the foreground group (and XML file) do match (e.g. lower frequency cutoff, template ID, spin2z) but others are different (e.g. masses, duration, template hash, spin1z). The "wrong" parameters appear to correspond to a template which is actually present in the bank, but with a different template ID.

Edit: wrong version quoted.

ahnitz commented 4 years ago

This is very weird. Do you have any reproducing case @titodalcanton? Also what are the respective template_id numbers of the two templates?

ahnitz commented 4 years ago

@titodalcanton Also, can you share what's in the foreground group, in its entirety? That might clarify where in the code this could have been introduced

ahnitz commented 4 years ago

Some places to think about which might help understand what could be going on. Maybe some of these can be eliminated as likely based on what your file output looks like. 1) The id and params are originals attached to each template object https://github.com/gwastro/pycbc/blob/master/pycbc/waveform/bank.py#L594 2) In the MF code, the id and template params are put into the results dict in the same place https://github.com/gwastro/pycbc/blob/master/pycbc/filter/matchedfilter.py#L1711 3) The results from MF are passed to the multiring single buffer based on their template_id https://github.com/gwastro/pycbc/blob/master/pycbc/events/coinc.py#L975 4) coincidence is done by template and the foreground group is populated using the template_id https://github.com/gwastro/pycbc/blob/master/pycbc/events/coinc.py#L1082

ahnitz commented 4 years ago

@titodalcanton The fact that you are seeing different parameters in by IFO means that at (4) when it tries to look up the singles triggers for the template id and trigger_ids of the coinc, it finds those results in the singles buffer. So (1) either the singles buffer is somehow mixing up templates into the wrong buffer at some point or (2) it was inserted into the singles buffer with the wrong tmeplate_id.

This rules out the coincident code itself and the coincbuffer from being likely culprits.

titodalcanton commented 4 years ago

Thanks for the thinking and pointers @ahnitz. Unfortunately this is O3 data being replayed, I cannot share a lot. @bhooshan-gadre and I are looking into this more. We have not searched for other cases yet. The template IDs are 188996 (the one actually quoted in the candidate) and 169544 (the one containing the parameters that mysteriously got used instead).

titodalcanton commented 4 years ago

This was initially noticed by @lpsinger, by the way.

ahnitz commented 4 years ago

@titodalcanton The only reason I asked for the template ids was to check for a bit flip (very unlikely, but it was a bit of hope). In any case, the two numbers aren't different by just a bit flip.

ahnitz commented 4 years ago

@titodalcanton To understand your original comment though, when you say that some parameters do match between the two ifos, do you mean that the original templates also had some matching parameters? Or do you mean that the parameters even within an ifo don't correspond to the same template?

If the former, how many templates share this parameter? i.e. is it an artefact of the bank construction that templates share some parameters? (maybe this is a clue as to how they got mixed up?)

If it's the latter, i.e. somehow the parameters in one ifo come from two different templates, then color me very confused, though likely this is useful information as well.

titodalcanton commented 4 years ago

Yup, I had also checked the binary representations of the IDs thinking about a memory corruption issue, but they seem different enough.

I mean that the two templates have some parameters in common (spin2z and f_lower) instead of being completely different. This seems an odd coincidence to me. I wondered about some kind of hash collision issue, but the hashes are different so I am not sure how it would lead to this issue. I have not carefully checked the bank to answer your question, but it is a good one.

titodalcanton commented 4 years ago

This was the only such case in ~400 coincs generated during the replay. I am now scanning a much larger set.

titodalcanton commented 4 years ago

There is another case from O3, so 2 cases in ~15000 coincs.

ahnitz commented 4 years ago

@titodalcanton Any common symptoms to that case? i.e. the parameters, etc.

ahnitz commented 4 years ago

@titodalcanton Depending on how this is occuring, it's possible it is also happening for background coincs. If that's the case, there may be a way to catch this with a shorter test. At a rate of 2 / 15000, there should be some examples in the background set after running for a couples hours. The easiest thing here would be to do a 5 hour run and have it set to dump the entire MultiRingBuffer and CoincBuffer to disk (maybe with a pickle) so that it can be inspected at the end.

titodalcanton commented 4 years ago

Here are some common features of the two observed cases:

The spin2z parameters are zero in all detectors.
The "different" template is assigned to one of the detectors forming the initial double coinc, not the followup detector.
The "different" template parameters are not too different: the chirp mass is within 20% in both cases.

lpsinger commented 4 years ago

My spider sense says to look for off by one errors in findchirp_cluster_over_window_cython or functions that handle its integer return value.

ahnitz commented 4 years ago

@titodalcanton Bullet(1) how common in a spin2z=0 template in this bank?

Bullet (2) doesn't mean much, as it just uses the template_id to regenerate the template.

Bullet (3) might mean something. Given your bank density how likely is this given random templates? If it's pretty unlikely, it might mean the templates have to be from the same template group (i.e in the filtering code). The templates are grouped by duration, which might end up with being close'ish in chirp mass as well.

ahnitz commented 4 years ago

@titodalcanton Actually, something I forgot to ask earlier. Are the two templates in both the cases you've found processed by the same worker? Or would they normally be processed by different ones?

spxiwh commented 4 years ago

spin2z = 0 is very common with the new bank generator.

ahnitz commented 4 years ago

My spider sense says to look for off by one errors in findchirp_cluster_over_window_cython or functions that handle its integer return value.

That function isn't actually used in pycbc live, but your point about 'off-by-1' errors is generically something one can look for. Hopefully, where the problem is can be pinpointed a bit more first.

bhooshan-gadre commented 4 years ago

Just so as people know, I didn't find any problematic coinc file in 39970 coinc files generated with injections. I will have to look at the hdf foreground to be sure. That is 0 per 40k for injections which much less than what @titodalcanton found.

titodalcanton commented 4 years ago

As Ian said, spin2z = 0 is very common, 77% of the templates have that.

In both cases, the two templates are assigned to different workers.

titodalcanton commented 4 years ago

There is another common thing about the two cases I found: they both happened after switching to Python 3. I am now checking the O2 triggers to see if we can find a case in Python 2.

Edit: no cases found in O2.

titodalcanton commented 4 years ago

Here is a potential candidate for a Python 2-vs-3 difference:

https://github.com/gwastro/pycbc/blob/master/pycbc/filter/matchedfilter.py#L1517

grabs will be a int array in Py2, and a float array in Py3. This will propagate to the arange() a few lines below, leading to different sequences. Eventually, chunks is converted to an int32 array anyway, but I am not sure if this will necessarily happen to produce the same result with Py2 and Py3.

@ahnitz in any case, do you see any reason not to change line #L1517 to an explicit floor division?

titodalcanton commented 4 years ago

I ran a test over the entire results of the O3 replay to verify that at least the single triggers saved to the HDF files have template_id and template parameters consistent with the bank. In fact, for many triggers, they do not. The rate of failures seems approximately 1 trigger out of several 8-s chunks, although some chunks have more than 1 incorrect trigger. This should be easier to debug than hunting for the occasional coincidence. I did not manage to reproduce this in the current unit test, probably due to the small bank, but I will add it as a check anyway. I am now checking the O3 analysis to hopefully pinpoint when the problem started; so far it seems it was not there at the beginning of O3.

ahnitz commented 4 years ago

@titodalcanton Do you see this issue on MDC data? If there is short fake noise case that can demonstrate the issue, I'd be happy to try to reproduce and look into it further.

titodalcanton commented 4 years ago

@titodalcanton Do you see this issue on MDC data? If there is short fake noise case that can demonstrate the issue, I'd be happy to try to reproduce and look into it further.

Not yet, but see #3278. It may just be a matter of running a longer test or with more templates.

titodalcanton commented 4 years ago

Yup, looking at O3 triggers, this does appear to have started when we switched to Python 3.

titodalcanton commented 4 years ago

Ok, I managed to reproduce this with a slightly modified version of the unit test. The key is using a production-level template bank (I used the O3 bank). The check fails right away after a few chunks.

titodalcanton commented 4 years ago

@ahnitz I found that I can reproduce this issue when running examples/live/run.sh with the full O3 bank, but not with the full O2 bank. So I started checking if there were obvious differences in the two banks, and my attention was caught by the template hashes.

Could this bug be explained by a hash collision here? Apparently the O3 bank generates 12 such collisions on Python 3 (maybe because of this?) and I see no collisions on Python 2.

ahnitz commented 4 years ago

@titodalcanton If there is a hash collision, that would cause a lot of problems, and well could explain this. Can you provide a set of numbers that has hash collisions in python3? It's somewhat confusing how this is possible unless python3 is no longer considering the full tuple when generating a hash (I'll quickly check that).

ahnitz commented 4 years ago

@titodalcanton @spxiwh BTW, I'm adding an offline tag due to the potential for a hash collision. If this is occurring it will also cause problems in the offline analysis if it is moved to python3.

ahnitz commented 4 years ago

@titodalcanton Have you calculated the hash value for each template to see if there is a collision?

ahnitz commented 4 years ago

@titodalcanton In investigating python2 vs 3 hash differences, there are some alarming ones.

There are two changes in hash() function between Python 2.7 and Python 3.4

Adoptions of SipHash
Default enabling of Hash randomization

The 'randomization' could cause a problem if it wasn't consistent between nodes. I'd have thought though that you might see much worse effects in this case.

titodalcanton commented 4 years ago

@titodalcanton Have you calculated the hash value for each template to see if there is a collision?

Yes, that is how I found out (well, that and by adding an assert here).

ahnitz commented 4 years ago

@titodalcanton This may be a possible solution for pycbc live assuming the hash collision is the explanation. https://github.com/gwastro/pycbc/pull/3291. We avoid hashing altogether. A different solution is needed for the offline analysis as I believe a similar situation is possible there.

titodalcanton commented 4 years ago

I confirm that the two O3 coincident triggers that originally raised this issue both have template parameters that produce a hash collision.

titodalcanton commented 4 years ago

After applying #3291, I ran PyCBC Live over O3 replay data for ~14 hours and I found zero cases of single triggers with inconsistent template parameters. I consider this issue fixed by #3291 and I am closing it.

gwastro / pycbc

PyCBC Live: inconsistent template parameters in different detectors #3263