get_fd_from_td should handle higher mode waveforms more gracefully

ahnitz commented 3 years ago

For higher mode waveforms which start their signals at a consistent frequency the get_fd_from_td function does not apply a tapering that is correct. This occurs because there are sharp turn-ons at the start of each mode, but the tapering only sucessfully gets applied to the earliest starting mode.

spxiwh commented 1 year ago

At least within lalsimulation, I've made a case that FD higher-order mode waveforms should (by default) use a consistent duration rather than consistent f_lower. It's https://git.ligo.org/lscsoft/lalsuite/-/issues/704 here, but not sure if this is viewable or not.

ahnitz commented 1 year ago

@spxiwh It is viewable. People outside just can't comment, so I'll add my thoughts here.

I think either choice has some issues. I think there are two use cases which drive either choice and are at somewhat odds.

(1) Doing injections: For injections, having a consistent start time is more convenient since you can ensure that some stretch of data contains the signals you desire.

(2) For doing an analysis (search, PE, etc): You care about if you are extracting all of the required signal-to-noise / likelihood. This is driven by the lower frequency cutoff in many cases. The exception is if the higher modes are sufficiently long that one or more of them runs off the end of the data, or there are say computational or convenience reasons for limiting the length of signals.

I'm honestly not sure what the right default is here. I don't see extremely strong arguments in either direction.

Note that if FD waveforms did choose the convention you propose then we'd have the reverse problem as suggested in this PR, e.g. tapering would now need to be mode-by-mode for FD waveforms, which is nontrivial (or we'd need to transform to time domain to do the tapering). At the moment, if you directly generate an FD waveform, you can apply sensible tapering directly in the frequency domain. It's the conversion from time that is difficult.

spxiwh commented 1 year ago

Agreed with most of that.

One of the main points of the issue I point to is that lalsuite expects waveforms to do 1, and it's own filtering etc. is assuming this. If a waveform does not do (1) it is a problem, and could lead to invalid waveforms. I'll also add that sometimes in PE, (1) is desired, to limit the amount of data in play and because power in HoM is generally coming at higher frequencies.

But I also agree that we do need a clean interface to do (2).

I do have a general concern about reproducibility of injections (especially with FD approximants), and would like a way that we can just "ask lalsuite to do that", rather than doing the FD conditioning ourselves. I'll try and discuss more with waveform folks about this though, and report back.

ahnitz commented 1 year ago

Well for injections of FD approximants, I actually lean towards all modes starting at a constant frequency. The tapering really needs to be applied before transforming to the time domain, and it can only be done easily for each mode if they all start at the same frequency. The problem is that for an fd waveform, if you convert to time domain, then apply some kind of tapering, it is largely too late to avoid gibbs junk.

The easiest for injections is that it as a constant frequency (for fd) and a constant time (for td). Anything else requires separation of modes to properly handle.

As for your later point, keep in mind that we will always want a consistent filtering / tapering / conversion scheme within pycbc. If lalsuite as a provider of a single source of waveforms (we have others!) has its own mechanism, it's fine to expose that too, as long as we don't lose the more general abilities as default.

gwastro / pycbc

get_fd_from_td should handle higher mode waveforms more gracefully #3794