Change logical_intervals() default complete_intervals from True to False

taldcroft commented 1 year ago

Description

The utility function cheta.utils.logical_intervals() has a poor choice of default with complete_intervals=True, with only myself to blame. This means that any True intervals that extend to the start or stop of interval are dropped. This behavior is good and necessary in certain situations (like kadi events where you don't want to store partial/incomplete events), but most of the time you expect to get all the intervals within the data.

This default setting cost me a couple hours debugging in https://github.com/sot/kadi/pull/261 finding an issue that depended oddly on the start / stop times.

Interface impacts

This changes a default that may impact code. For the most part I would expect that this will fix latent bugs where the code is actually assuming complete_intervals=False, but we'll need to advertise this.

Impacts to code in the `sot` repository.

I have a directory that has a clone of every sot repository, and did the following audit to identify a

➜  sot_repos find . -name '*.py' | grep -v "/build/" | grep -v "./eng_archive/Ska" \
   | xargs grep -A 5 "logical_intervals(" | grep -v "/build/"

# DON'T CARE but OK with complete_intervals=False
./gyro_bias/kalman_threshold/search_scripts/get_low_kalstr.py:    lowkals = dat.logical_intervals('<=', '2 ')
./gyro_bias/kalman_threshold/search_scripts/get_low_kalstr.py-    lowkals = lowkals[lowkals['duration'] < 120]  # too-long intervals are spurious
./gyro_bias/kalman_threshold/search_scripts/get_low_kalstr.py-    bad = lowkals['duration'] > 60
./gyro_bias/kalman_threshold/search_scripts/get_low_kalstr.py-    lowkals = lowkals[bad]
./gyro_bias/kalman_threshold/search_scripts/get_low_kalstr.py-    for interval in lowkals:
./gyro_bias/kalman_threshold/search_scripts/get_low_kalstr.py-        ds = events.dwells.filter(start=interval['tstart'], stop=interval['tstop'])

# CHANGE: could split an interval in two
./kalman_watch/kalman_watch/low_kalman_mon.py:    lowkals = logical_intervals(
./kalman_watch/kalman_watch/low_kalman_mon.py-        dat["aokalstr"].times,
./kalman_watch/kalman_watch/low_kalman_mon.py-        (dat["aokalstr"].vals.astype(int) <= 1)
./kalman_watch/kalman_watch/low_kalman_mon.py-        & (dat["aoacaseq"].vals == "KALM")
./kalman_watch/kalman_watch/low_kalman_mon.py-        & (dat["aopcadmd"].vals == "NPNT"),
./kalman_watch/kalman_watch/low_kalman_mon.py-        max_gap=10.0,

# CORRECT with complete_intervals=False (intervals not written to file, incomplete
# intervals should be displayed on plot)
./kalman_watch/kalman_watch/kalman_perigee_mon.py:            ints_low = logical_intervals(self.data["times"], vals <= n_kalstr)
./kalman_watch/kalman_watch/kalman_perigee_mon.py-            ints_low = ints_low[ints_low["duration"] > dur_limit]
./kalman_watch/kalman_watch/kalman_perigee_mon.py-
./kalman_watch/kalman_watch/kalman_perigee_mon.py-            for int_low in ints_low:
./kalman_watch/kalman_watch/kalman_perigee_mon.py-                p0 = int_low["tstart"] - self.perigee.cxcsec
./kalman_watch/kalman_watch/kalman_perigee_mon.py-                p1 = int_low["tstop"] - self.perigee.cxcsec

# CHANGE: this is important
./kadi/kadi/events/models.py:            states = utils.logical_intervals(times, bools, max_gap=MAX_GAP)
./kadi/kadi/events/models.py-        except (IndexError, ValueError):
./kadi/kadi/events/models.py-            if event_time_fuzz is None:
./kadi/kadi/events/models.py-                logger.warn(
./kadi/kadi/events/models.py-                    "Warning: No telemetry available for {}".format(cls.__name__)
./kadi/kadi/events/models.py-                )

# DON'T CARE (code being replaced)
./validate_states/validate_states.py:    dith_disa_states = logical_intervals(tlm['date'], tlm['aodithen'] == 'DISA')
./validate_states/validate_states.py-    for state in dith_disa_states:
./validate_states/validate_states.py-        # Index back into telemetry for each of these constant dither disable states
./validate_states/validate_states.py-        idx0 = np.searchsorted(tlm['date'], state['tstart'], side='left')
./validate_states/validate_states.py-        idx1 = np.searchsorted(tlm['date'], state['tstop'], side='right')
./validate_states/validate_states.py-        # If any samples have aca calibration flag, mark interval for exclusion.

# CORRECT with complete_intervals=False (JEAN?)
./aca_weekly_report/aca_weekly_report/report.py:        intervals = logical_intervals(dat['AOKALSTR'].times,
./aca_weekly_report/aca_weekly_report/report.py-                                      low_kal,
./aca_weekly_report/aca_weekly_report/report.py-                                      max_gap=10)
./aca_weekly_report/aca_weekly_report/report.py-        kalman_data['consec_lt_two'] = np.max(intervals['duration'])
./aca_weekly_report/aca_weekly_report/report.py-    return kalman_data
./aca_weekly_report/aca_weekly_report/report.py-

# DON'T CARE: Local logical_intervals()
./aca_status_flags/analysis_plots.py:    low = logical_intervals(dat['times'], tlm_n_kalman, '<=', 1)
./aca_status_flags/analysis_plots.py:    low = logical_intervals(dat['times'], pred_n_kalman, '<=', 1)
./aca_status_flags/analysis_plots.py:def logical_intervals(times, vals, op, val):
./aca_status_flags/analysis_plots.py-    """Determine contiguous intervals during which the logical comparison
./aca_status_flags/analysis_plots.py-    expression "MSID.vals op val" is True.  Allowed values for ``op``"""

# CORRECT with complete_intervals=False
./aca_status_flags/analysis_plots.py:      manvs = dat.logical_intervals('==', 'NEND')
./aca_status_flags/analysis_plots.py-      manvs['duration']
./aca_status_flags/analysis_plots.py-
./aca_status_flags/analysis_plots.py-    :param vals: input values
./aca_status_flags/analysis_plots.py-    :param op: logical operator, one of ==  !=  >  <  >=  <=
./aca_status_flags/analysis_plots.py-    :param val: comparison value

# DON'T CARE
./safemode_2015264/plot_fish.py:    # ok = logical_intervals(roll.times, np.abs(roll.midvals) < 20.1)
./safemode_2015264/plot_fish.py-    # roll.select_intervals(ok)
./safemode_2015264/plot_fish.py-    # pitch.select_intervals(ok)
./safemode_2015264/plot_fish.py-
./safemode_2015264/plot_fish.py:    off_nom = logical_intervals(roll.times, np.abs(roll.midvals) > 3)
./safemode_2015264/plot_fish.py-    roll_off_nom = roll.select_intervals(off_nom, copy=True)
./safemode_2015264/plot_fish.py-    pitch_off_nom = pitch.select_intervals(off_nom, copy=True)
./safemode_2015264/plot_fish.py-
./safemode_2015264/plot_fish.py-if 'greta' not in globals():
./safemode_2015264/plot_fish.py-    greta = Table.read('greta_values.dat', format='ascii.basic')

Testing

Unit tests

[x] Mac

Independent check of unit tests by Jean

[x] Linux

Functional tests

No functional testing.

jeanconn commented 1 year ago

For the aca_weekly_report, I think technically it would need to change to match current code behavior, but I can't see that it matters for that reporting.

jeanconn commented 1 year ago

With regard to your question @javierggt , "Isn't intervals["val"][-1] always True in here? In other words: the last interval is always dropped?" I'm not clear on that. For state_intervals, when that second argument is bools, the intervals are going to end up as a table of all the intervals but annotated such that "val" is True or False based on it if that row matched the condition. So I figured if we are in the non-matching state at the end, intervals["val"][-1] is false.

taldcroft commented 1 year ago

Demonstrating what @jeanconn said with code, the final True interval is not always dropped.

In [15]: times = np.arange(100, 105)

In [18]: vals = np.array([1, 0, 1, 0, 0], dtype=bool)

In [19]: logical_intervals(times, vals, complete_intervals=True)
Out[19]: 
<Table length=1>
      datestart              datestop       duration  tstart  tstop 
        str21                 str21         float64  float64 float64
--------------------- --------------------- -------- ------- -------
1998:001:00:00:38.316 1998:001:00:00:39.316      1.0   101.5   102.5

sot / cheta