Swap TGapsPercent to Gaps

rhiannonlynne commented 1 year ago

The obvious question would be - why drop the TGapsPercent metric? So, before doing that, I would like to see a comment from Eric Bellm @ebellm about this replacement. I do understand that the TGapsPercent metric can be confusing and doesn't always properly illustrate what is going on, so I do see the advantages to changing it. The GapsMetric is likely a better way to analyze a similar thing. However, I also think that we're likely not capturing all of the timescales of interest that Eric was trying to explore, when using 7 hours and 1 day / 24 hours only, so I'd like to get his feedback in particular.

(it's possible that 7 hours and 24 hours fill the requirements just fine -- I'm not the expert on this time domain).

yoachim commented 1 year ago

We still have the Tgaps histograms for each filter. I think that's the metric that is useful for checking a wide range of timescales.

ebellm commented 1 year ago

I left some small comments on the PR, but let me make a more general comment about the potential replacement of TGapsPercentMetric with GapsMetric.

TGapsPercentMetric takes the difference of consecutive observing times at a location, histograms the difference, and then calculates the percentage between a given max and min timescale (2-24 hours, by default).

GapsMetric bins the observing times and counts how many times there are observations separated by a given gap timescale +/- a width which is presently that timescale/4.

pros: GapsMetric is likely less sensitive to dense sampling effects (e.g., in DDFs) than TGapsPercentMetric: it doesn't care how many observations there are because it doesn't look at consecutive observations, it just checks if any observations are separated by the relevant spacing.

cons: the GapsMetric binned implementation means that width of the tolerance needs to be a fixed fraction of the gap timescale. So the Gap_7 and Gap_24 metrics look for separations of 5.25-8.75 and 18-30 hours, respectively. This is substantially less flexible than the arbitrary timescales allowed by TGapsPercentMetric and makes it potentially less useful. In the ApJS paper I used timescales of a) 2-14 and b) 14-38 hours to capture a) timescales larger than solar-system pair spacing but within the same night and b) next-night revisits. While I suspect the GapsMetric values would generally correlate with the TGapsPercentMetric values, the edges do matter--in particular, improvements in revisits at 3-4 hour timescales wouldn't show up in the Gap_7 metric. Generally the longest within-night separations are going to provide the most value, so perhaps it's not a big loss, but if I'm a fast transients person mainly focused on WFD it would merit a further look.

yoachim commented 1 year ago

I think I agree with all of that.

The major reason for the switch is that TGapsPercentMetric is a measure of how optimized a survey is for a timescale, which is subtly different from how well a given timescale has been observed. The problem is that when the SCOC compares simulations, they treat TGapsPercentMetric as a science metric proxie, even though it doesn't behave like one. My prime example is what happens if I run the survey for an extra year. Depending on how I do the observations, TGapsPercentMetric could go up or down. So the SCOC will look at the results and say "oh no, we can't approve observing for an 11th year because that hurts science on 3hr timescales!". GapsMetric is limited in that it forces folks to pick a few timescales, but the algorithm is entropy-like so that adding more observations means the metric value can only increase or stay constant.

ebellm commented 1 year ago

That makes a lot of sense, @yoachim--I appreciate how a relative metric could be confusing in that regard.

lsst / rubin_sim

Swap TGapsPercent to Gaps #331