pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
30.1k stars 1.94k forks source link

It would be nice to be able to aggregate not just by rolling windows in time but by boolean expression bounded windows #16367

Open mmcdermott opened 5 months ago

mmcdermott commented 5 months ago

Description

I have a use case in polars where I need to summarize a dataframe of medical patient data by computing the sums of columns over dynamically sized windows of data that are bounded at the start or end by rows that have a given boolean expression evaluate to True and at the other endpoint by a row in a dataframe, potentially with a row-based or temporal offset. This is useful in my context because, for example, in addition to wanting to compute summary statistics for, for example, the last 7 days of a patient's data, I may also want to compute summary statistics of the period of a patient's data up until the start of their hospital admission, or their prior diagnosis of a certain kind, etc. The offset is useful for when I want to summarize, for example, only up to the start of an admission that occurred more than 24 hours ago, for example.

It is possible to do this efficiently with a manipulation of cumulative sums and a bit of event timestamp fudging to handle offsets correctly. In particular, I have an implementation in python leveraging the polars python API here: https://github.com/justin13601/ESGPTTaskQuerying/blob/33c6ebbb62ce1b211ae39fb4ee84c4d7c09ec025/src/esgpt_task_querying/aggregate.py#L370C1-L1014C6

Note the code block linked above not only contains a working example in python, but also an extensive docstring with doctests outlining how (my current version) functions, to make this request more concrete.

But, it seems to me that this capability would be useful in other settings as well, and could be a good core capability within polars. I suspect the implementation internally would look similar to how groupy_rolling is implemented, so I thought I would post about it here in case others have a similar request (or in case there is a more natural way to do this that I'm missing.

MarcoGorelli commented 5 months ago

hi @mmcdermott - to expedite resolution, could you please include a simple snippet which shows something like:

mmcdermott commented 5 months ago

I'm starting with medical data, where I have columns: "id", "timestamp", ...data_columns..., boundary_column, where boundary_column is a boolean. Grouped by id, I would like to aggregate over data_columns between each row of my dataframe and either the immediately subsequent or prior row where boundary_column is True, subject to some potential temporal offset evaluated over the "timestamp" column.

Here is the API I would roughly like, realized through a function that implements this capability for a limited set of aggregations in pure python with doctest examples in the docstring. This function can be found here: https://github.com/justin13601/ACES/blob/174d6b6522e7220b1a65f2279f9411a089873ede/src/aces/aggregate.py#L370

def boolean_expr_bound_sum(
    df: pl.DataFrame,
    boundary_expr: pl.Expr,
    mode: str,
    closed: str,
    offset: timedelta = timedelta(0),
) -> pl.DataFrame:
    """Sums all columns of ``df`` between each row plus an offset and the next per-subject satisfying event.

    TODO: Use https://hypothesis.readthedocs.io/en/latest/quickstart.html to test this function.

    Performs a boolean-expression-bounded summation over the columns of ``df``. The logic of this is as
    follows.
      - If ``mode`` is ``bound_to_row``, then that means that the _left_ endpoint of the
        window must correspond to an instance where ``boundary_expr`` is `True` and the _right_ endpoint will
        correspond to a row in the dataframe.
      - If ``mode`` is ``row_to_bound``, then that means that the _right_ endpoint of the
        window must correspond to an instance where ``boundary_expr`` is `True` and the _left_ endpoint will
        correspond to a row in the dataframe.
      - If ``closed`` is ``'none'``, then neither the associated left row endpoint nor the associated right
        row endpoint will be included in the summation.
      - If ``closed`` is ``'both'``, then both the associated left row endpoint and the associated right
        row endpoint will be included in the summation.
      - If ``closed`` is ``'left'``, then only the associated left row endpoint will be included in the
        summation.
      - If ``closed`` is ``'right'``, then only the associated right row endpoint will be included in the
        summation.
      - The output dataframe will have the same number of rows and order as the input dataframe. Each row will
        correspond to the aggregation over the matching window that uses that row as the "row" side of the
        terminating aggregation (regardless of whether that corresponds to the left or the right endpoint, and
        regardless of whether that endpoint would actually be included in the calculation based on
        ``closed``). The associated ``boundary_expr`` side of the endpoint will be the nearest possible
        ``boundary_expr`` endpoint that produces a non-empty set of rows over which to aggregate. Note that
        this may depend on the value of ``closed`` -- e.g., a row with ``boundary_expr`` being `True` can
        produce a one-element set of rows to aggregate if ``closed`` is ``'both'``, but this is not possible
        if ``closed`` is something else, and in that case that same row may instead rely on a different row to
        fill its ``boundary_expr`` endpoint when evaluated as a "row endpoint" during calculation.
      - Offset further modifies this logic by applying a temporal offset of fixed size to the "row" endpoint.
        All other logic stays the same.

    In particular, suppose that we have following rows and boolean boundary expression evaluations (for a
    single subject):
Rows:                  [0,      1,      2,      3,      4,    5,      6]
Boundary Expression:   [False,  True,   False,  True,   True, False,  False]
```
Then, we would aggregate the following rows under the following specified conditions:
```
mode         | closed | aggregate_groups
bound_to_row |  both  | [],     [1],    [1, 2], [3],    [4],  [4, 5], [4, 5, 6]
bound_to_row |  left  | [],     [],     [1],    [1, 2], [3],  [4],    [4, 5]
bound_to_row |  right | [],     [],     [2],    [2, 3], [4],  [5],    [5, 6]
bound_to_row |  none  | [],     [],     [],     [2],    [],   [],     [5]
-------------|--------|------------------------------------------------------
row_to_bound |  both  | [0, 1], [1],    [2, 3], [3],    [4],  [],     []
row_to_bound |  left  | [0],    [],     [2],    [],     [],   [],     []
row_to_bound |  right | [1],    [2, 3], [3],    [4],    [],   [],     []
row_to_bound |  none  | [],     [2],    [],     [],     [],   [],     []
```

How to think about this? the ``closed`` parameter controls where we put the endpoints to merge for our
bounds. Consider the case accented with ** above, where we have the row being row 1 (indexing from 0), we
are in a ``row_to_bound`` setting, and we have ``closed = "none"``. For this, as the left endpoint (here
the row, as we are in ``row_to_bound`` is not included, so our boundary on the left is between row 2 and
3. Our boundary on the right is just to the left of the next row which has the True boolean value. In this
case, that means between row 2 and 3 (as row 1 is to the left of our left endpoint). In contrast, if
closed were ``"left"``, then row 1 would be a possible bound row to use on the right, and thus by virtue
of our right endpoint being open, we would include no rows.

Args:
    df: The dataframe to be aggregated. The input must be sorted in ascending order by
        timestamp within each subject group. It must contain the following columns:
          - A column ``subject_id`` which contains the subject ID.
          - A column ``timestamp`` which contains the timestamp at which the event contained in any given
            row occurred.
          - A set of other columns that will be summed over the specified windows.
    boundary_expr: A boolean expression which can be evaluated over the passed dataframe, and defines rows
        that can serve as valid left or right boundaries for the aggregation step. The precise window over
        which the dataframe will be aggregated will depend on ``mode`` and ``closed``. See above for an
        explanation of the various configurations possible.
    mode:
    closed:
    offset:

Returns:
    The dataframe that has been aggregated in an event bound manner according to the specified
    ``endpoint_expr``. This aggregation means the following:
      - The output dataframe will contain the same number of rows and be in the same order (in terms of
        ``subject_id`` and ``timestamp``) as the input dataframe.
      - The column names will also be the same as the input dataframe, plus there will be two additional
        column, ``timestamp_at_end`` that contains the timestamp of the end of the aggregation window for
        each row (or null if no such aggregation window exists) and ``timestamp_at_start`` that contains
        the timestamp at the start of the aggregation window corresponding to the row.
      - The values in the predicate columns of the output dataframe will contain the sum of the values
        over the permissible row ranges given the input parameters. See above for an explanation.

Examples:
    >>> import polars as pl
    >>> _ = pl.Config.set_tbl_width_chars(150)
    >>> from datetime import datetime
    >>> df = pl.DataFrame({
    ...     "subject_id": [1, 1, 1, 2, 2, 2, 2, 2],
    ...     "timestamp": [
    ...         # Subject 1
    ...         datetime(year=1989, month=12, day=1,  hour=12, minute=3),
    ...         datetime(year=1989, month=12, day=3,  hour=13, minute=14), # HAS EVENT BOUND
    ...         datetime(year=1989, month=12, day=5,  hour=15, minute=17),
    ...         # Subject 2
    ...         datetime(year=1989, month=12, day=2,  hour=12, minute=3),
    ...         datetime(year=1989, month=12, day=4,  hour=13, minute=14),
    ...         datetime(year=1989, month=12, day=6,  hour=15, minute=17), # HAS EVENT BOUND
    ...         datetime(year=1989, month=12, day=8,  hour=16, minute=22),
    ...         datetime(year=1989, month=12, day=10, hour=3,  minute=7),  # HAS EVENT BOUND
    ...     ],
    ...     "idx":  [0, 1, 2, 3, 4, 5, 6, 7],
    ...     "is_A": [1, 0, 1, 1, 1, 1, 0, 0],
    ...     "is_B": [0, 1, 0, 1, 0, 1, 1, 1],
    ...     "is_C": [0, 1, 0, 0, 0, 1, 0, 1],
    ... })
    >>> boolean_expr_bound_sum(
    ...     df,
    ...     pl.col("idx").is_in([1, 4, 7]),
    ...     "bound_to_row",
    ...     "both",
    ... ).drop("idx")
    shape: (8, 7)
    ┌────────────┬─────────────────────┬─────────────────────┬─────────────────────┬──────┬──────┬──────┐
    │ subject_id ┆ timestamp           ┆ timestamp_at_start  ┆ timestamp_at_end    ┆ is_A ┆ is_B ┆ is_C │
    │ ---        ┆ ---                 ┆ ---                 ┆ ---                 ┆ ---  ┆ ---  ┆ ---  │
    │ i64        ┆ datetime[μs]        ┆ datetime[μs]        ┆ datetime[μs]        ┆ i64  ┆ i64  ┆ i64  │
    ╞════════════╪═════════════════════╪═════════════════════╪═════════════════════╪══════╪══════╪══════╡
    │ 1          ┆ 1989-12-01 12:03:00 ┆ null                ┆ null                ┆ 0    ┆ 0    ┆ 0    │
    │ 1          ┆ 1989-12-03 13:14:00 ┆ 1989-12-03 13:14:00 ┆ 1989-12-03 13:14:00 ┆ 0    ┆ 1    ┆ 1    │
    │ 1          ┆ 1989-12-05 15:17:00 ┆ 1989-12-03 13:14:00 ┆ 1989-12-05 15:17:00 ┆ 1    ┆ 1    ┆ 1    │
    │ 2          ┆ 1989-12-02 12:03:00 ┆ null                ┆ null                ┆ 0    ┆ 0    ┆ 0    │
    │ 2          ┆ 1989-12-04 13:14:00 ┆ 1989-12-04 13:14:00 ┆ 1989-12-04 13:14:00 ┆ 1    ┆ 0    ┆ 0    │
    │ 2          ┆ 1989-12-06 15:17:00 ┆ 1989-12-04 13:14:00 ┆ 1989-12-06 15:17:00 ┆ 2    ┆ 1    ┆ 1    │
    │ 2          ┆ 1989-12-08 16:22:00 ┆ 1989-12-04 13:14:00 ┆ 1989-12-08 16:22:00 ┆ 2    ┆ 2    ┆ 1    │
    │ 2          ┆ 1989-12-10 03:07:00 ┆ 1989-12-10 03:07:00 ┆ 1989-12-10 03:07:00 ┆ 0    ┆ 1    ┆ 1    │
    └────────────┴─────────────────────┴─────────────────────┴─────────────────────┴──────┴──────┴──────┘
    >>> boolean_expr_bound_sum(
    ...     df,
    ...     pl.col("idx").is_in([1, 4, 7]),
    ...     "bound_to_row",
    ...     "none",
    ... ).drop("idx")
    shape: (8, 7)
    ┌────────────┬─────────────────────┬─────────────────────┬─────────────────────┬──────┬──────┬──────┐
    │ subject_id ┆ timestamp           ┆ timestamp_at_start  ┆ timestamp_at_end    ┆ is_A ┆ is_B ┆ is_C │
    │ ---        ┆ ---                 ┆ ---                 ┆ ---                 ┆ ---  ┆ ---  ┆ ---  │
    │ i64        ┆ datetime[μs]        ┆ datetime[μs]        ┆ datetime[μs]        ┆ i64  ┆ i64  ┆ i64  │
    ╞════════════╪═════════════════════╪═════════════════════╪═════════════════════╪══════╪══════╪══════╡
    │ 1          ┆ 1989-12-01 12:03:00 ┆ null                ┆ null                ┆ 0    ┆ 0    ┆ 0    │
    │ 1          ┆ 1989-12-03 13:14:00 ┆ null                ┆ null                ┆ 0    ┆ 0    ┆ 0    │
    │ 1          ┆ 1989-12-05 15:17:00 ┆ 1989-12-03 13:14:00 ┆ 1989-12-05 15:17:00 ┆ 0    ┆ 0    ┆ 0    │
    │ 2          ┆ 1989-12-02 12:03:00 ┆ null                ┆ null                ┆ 0    ┆ 0    ┆ 0    │
    │ 2          ┆ 1989-12-04 13:14:00 ┆ null                ┆ null                ┆ 0    ┆ 0    ┆ 0    │
    │ 2          ┆ 1989-12-06 15:17:00 ┆ 1989-12-04 13:14:00 ┆ 1989-12-06 15:17:00 ┆ 0    ┆ 0    ┆ 0    │
    │ 2          ┆ 1989-12-08 16:22:00 ┆ 1989-12-04 13:14:00 ┆ 1989-12-08 16:22:00 ┆ 1    ┆ 1    ┆ 1    │
    │ 2          ┆ 1989-12-10 03:07:00 ┆ 1989-12-04 13:14:00 ┆ 1989-12-10 03:07:00 ┆ 1    ┆ 2    ┆ 1    │
    └────────────┴─────────────────────┴─────────────────────┴─────────────────────┴──────┴──────┴──────┘
    >>> boolean_expr_bound_sum(
    ...     df,
    ...     pl.col("idx").is_in([1, 4, 7]),
    ...     "bound_to_row",
    ...     "left",
    ... ).drop("idx")
    shape: (8, 7)
    ┌────────────┬─────────────────────┬─────────────────────┬─────────────────────┬──────┬──────┬──────┐
    │ subject_id ┆ timestamp           ┆ timestamp_at_start  ┆ timestamp_at_end    ┆ is_A ┆ is_B ┆ is_C │
    │ ---        ┆ ---                 ┆ ---                 ┆ ---                 ┆ ---  ┆ ---  ┆ ---  │
    │ i64        ┆ datetime[μs]        ┆ datetime[μs]        ┆ datetime[μs]        ┆ i64  ┆ i64  ┆ i64  │
    ╞════════════╪═════════════════════╪═════════════════════╪═════════════════════╪══════╪══════╪══════╡
    │ 1          ┆ 1989-12-01 12:03:00 ┆ null                ┆ null                ┆ 0    ┆ 0    ┆ 0    │
    │ 1          ┆ 1989-12-03 13:14:00 ┆ 1989-12-03 13:14:00 ┆ 1989-12-03 13:14:00 ┆ 0    ┆ 0    ┆ 0    │
    │ 1          ┆ 1989-12-05 15:17:00 ┆ 1989-12-03 13:14:00 ┆ 1989-12-05 15:17:00 ┆ 0    ┆ 1    ┆ 1    │
    │ 2          ┆ 1989-12-02 12:03:00 ┆ null                ┆ null                ┆ 0    ┆ 0    ┆ 0    │
    │ 2          ┆ 1989-12-04 13:14:00 ┆ 1989-12-04 13:14:00 ┆ 1989-12-04 13:14:00 ┆ 0    ┆ 0    ┆ 0    │
    │ 2          ┆ 1989-12-06 15:17:00 ┆ 1989-12-04 13:14:00 ┆ 1989-12-06 15:17:00 ┆ 1    ┆ 0    ┆ 0    │
    │ 2          ┆ 1989-12-08 16:22:00 ┆ 1989-12-04 13:14:00 ┆ 1989-12-08 16:22:00 ┆ 2    ┆ 1    ┆ 1    │
    │ 2          ┆ 1989-12-10 03:07:00 ┆ 1989-12-10 03:07:00 ┆ 1989-12-10 03:07:00 ┆ 0    ┆ 0    ┆ 0    │
    └────────────┴─────────────────────┴─────────────────────┴─────────────────────┴──────┴──────┴──────┘
    >>> boolean_expr_bound_sum(
    ...     df,
    ...     pl.col("idx").is_in([1, 4, 7]),
    ...     "bound_to_row",
    ...     "right",
    ... ).drop("idx")
    shape: (8, 7)
    ┌────────────┬─────────────────────┬─────────────────────┬─────────────────────┬──────┬──────┬──────┐
    │ subject_id ┆ timestamp           ┆ timestamp_at_start  ┆ timestamp_at_end    ┆ is_A ┆ is_B ┆ is_C │
    │ ---        ┆ ---                 ┆ ---                 ┆ ---                 ┆ ---  ┆ ---  ┆ ---  │
    │ i64        ┆ datetime[μs]        ┆ datetime[μs]        ┆ datetime[μs]        ┆ i64  ┆ i64  ┆ i64  │
    ╞════════════╪═════════════════════╪═════════════════════╪═════════════════════╪══════╪══════╪══════╡
    │ 1          ┆ 1989-12-01 12:03:00 ┆ null                ┆ null                ┆ 0    ┆ 0    ┆ 0    │
    │ 1          ┆ 1989-12-03 13:14:00 ┆ null                ┆ null                ┆ 0    ┆ 0    ┆ 0    │
    │ 1          ┆ 1989-12-05 15:17:00 ┆ 1989-12-03 13:14:00 ┆ 1989-12-05 15:17:00 ┆ 1    ┆ 0    ┆ 0    │
    │ 2          ┆ 1989-12-02 12:03:00 ┆ null                ┆ null                ┆ 0    ┆ 0    ┆ 0    │
    │ 2          ┆ 1989-12-04 13:14:00 ┆ null                ┆ null                ┆ 0    ┆ 0    ┆ 0    │
    │ 2          ┆ 1989-12-06 15:17:00 ┆ 1989-12-04 13:14:00 ┆ 1989-12-06 15:17:00 ┆ 1    ┆ 1    ┆ 1    │
    │ 2          ┆ 1989-12-08 16:22:00 ┆ 1989-12-04 13:14:00 ┆ 1989-12-08 16:22:00 ┆ 1    ┆ 2    ┆ 1    │
    │ 2          ┆ 1989-12-10 03:07:00 ┆ 1989-12-04 13:14:00 ┆ 1989-12-10 03:07:00 ┆ 1    ┆ 3    ┆ 2    │
    └────────────┴─────────────────────┴─────────────────────┴─────────────────────┴──────┴──────┴──────┘
    >>> boolean_expr_bound_sum(
    ...     df,
    ...     pl.col("idx").is_in([1, 4, 7]),
    ...     "row_to_bound",
    ...     "both",
    ... ).drop("idx")
    shape: (8, 7)
    ┌────────────┬─────────────────────┬─────────────────────┬─────────────────────┬──────┬──────┬──────┐
    │ subject_id ┆ timestamp           ┆ timestamp_at_start  ┆ timestamp_at_end    ┆ is_A ┆ is_B ┆ is_C │
    │ ---        ┆ ---                 ┆ ---                 ┆ ---                 ┆ ---  ┆ ---  ┆ ---  │
    │ i64        ┆ datetime[μs]        ┆ datetime[μs]        ┆ datetime[μs]        ┆ i64  ┆ i64  ┆ i64  │
    ╞════════════╪═════════════════════╪═════════════════════╪═════════════════════╪══════╪══════╪══════╡
    │ 1          ┆ 1989-12-01 12:03:00 ┆ 1989-12-01 12:03:00 ┆ 1989-12-03 13:14:00 ┆ 1    ┆ 1    ┆ 1    │
    │ 1          ┆ 1989-12-03 13:14:00 ┆ 1989-12-03 13:14:00 ┆ 1989-12-03 13:14:00 ┆ 0    ┆ 1    ┆ 1    │
    │ 1          ┆ 1989-12-05 15:17:00 ┆ null                ┆ null                ┆ 0    ┆ 0    ┆ 0    │
    │ 2          ┆ 1989-12-02 12:03:00 ┆ 1989-12-02 12:03:00 ┆ 1989-12-04 13:14:00 ┆ 2    ┆ 1    ┆ 0    │
    │ 2          ┆ 1989-12-04 13:14:00 ┆ 1989-12-04 13:14:00 ┆ 1989-12-04 13:14:00 ┆ 1    ┆ 0    ┆ 0    │
    │ 2          ┆ 1989-12-06 15:17:00 ┆ 1989-12-06 15:17:00 ┆ 1989-12-10 03:07:00 ┆ 1    ┆ 3    ┆ 2    │
    │ 2          ┆ 1989-12-08 16:22:00 ┆ 1989-12-08 16:22:00 ┆ 1989-12-10 03:07:00 ┆ 0    ┆ 2    ┆ 1    │
    │ 2          ┆ 1989-12-10 03:07:00 ┆ 1989-12-10 03:07:00 ┆ 1989-12-10 03:07:00 ┆ 0    ┆ 1    ┆ 1    │
    └────────────┴─────────────────────┴─────────────────────┴─────────────────────┴──────┴──────┴──────┘
    >>> boolean_expr_bound_sum(
    ...     df,
    ...     pl.col("idx").is_in([1, 4, 7]),
    ...     "row_to_bound",
    ...     "none",
    ... ).drop("idx")
    shape: (8, 7)
    ┌────────────┬─────────────────────┬─────────────────────┬─────────────────────┬──────┬──────┬──────┐
    │ subject_id ┆ timestamp           ┆ timestamp_at_start  ┆ timestamp_at_end    ┆ is_A ┆ is_B ┆ is_C │
    │ ---        ┆ ---                 ┆ ---                 ┆ ---                 ┆ ---  ┆ ---  ┆ ---  │
    │ i64        ┆ datetime[μs]        ┆ datetime[μs]        ┆ datetime[μs]        ┆ i64  ┆ i64  ┆ i64  │
    ╞════════════╪═════════════════════╪═════════════════════╪═════════════════════╪══════╪══════╪══════╡
    │ 1          ┆ 1989-12-01 12:03:00 ┆ 1989-12-01 12:03:00 ┆ 1989-12-03 13:14:00 ┆ 0    ┆ 0    ┆ 0    │
    │ 1          ┆ 1989-12-03 13:14:00 ┆ null                ┆ null                ┆ 0    ┆ 0    ┆ 0    │
    │ 1          ┆ 1989-12-05 15:17:00 ┆ null                ┆ null                ┆ 0    ┆ 0    ┆ 0    │
    │ 2          ┆ 1989-12-02 12:03:00 ┆ 1989-12-02 12:03:00 ┆ 1989-12-04 13:14:00 ┆ 0    ┆ 0    ┆ 0    │
    │ 2          ┆ 1989-12-04 13:14:00 ┆ 1989-12-04 13:14:00 ┆ 1989-12-10 03:07:00 ┆ 1    ┆ 2    ┆ 1    │
    │ 2          ┆ 1989-12-06 15:17:00 ┆ 1989-12-06 15:17:00 ┆ 1989-12-10 03:07:00 ┆ 0    ┆ 1    ┆ 0    │
    │ 2          ┆ 1989-12-08 16:22:00 ┆ 1989-12-08 16:22:00 ┆ 1989-12-10 03:07:00 ┆ 0    ┆ 0    ┆ 0    │
    │ 2          ┆ 1989-12-10 03:07:00 ┆ null                ┆ null                ┆ 0    ┆ 0    ┆ 0    │
    └────────────┴─────────────────────┴─────────────────────┴─────────────────────┴──────┴──────┴──────┘
    >>> boolean_expr_bound_sum(
    ...     df,
    ...     pl.col("idx").is_in([1, 4, 7]),
    ...     "row_to_bound",
    ...     "left",
    ... ).drop("idx")
    shape: (8, 7)
    ┌────────────┬─────────────────────┬─────────────────────┬─────────────────────┬──────┬──────┬──────┐
    │ subject_id ┆ timestamp           ┆ timestamp_at_start  ┆ timestamp_at_end    ┆ is_A ┆ is_B ┆ is_C │
    │ ---        ┆ ---                 ┆ ---                 ┆ ---                 ┆ ---  ┆ ---  ┆ ---  │
    │ i64        ┆ datetime[μs]        ┆ datetime[μs]        ┆ datetime[μs]        ┆ i64  ┆ i64  ┆ i64  │
    ╞════════════╪═════════════════════╪═════════════════════╪═════════════════════╪══════╪══════╪══════╡
    │ 1          ┆ 1989-12-01 12:03:00 ┆ 1989-12-01 12:03:00 ┆ 1989-12-03 13:14:00 ┆ 1    ┆ 0    ┆ 0    │
    │ 1          ┆ 1989-12-03 13:14:00 ┆ null                ┆ null                ┆ 0    ┆ 0    ┆ 0    │
    │ 1          ┆ 1989-12-05 15:17:00 ┆ null                ┆ null                ┆ 0    ┆ 0    ┆ 0    │
    │ 2          ┆ 1989-12-02 12:03:00 ┆ 1989-12-02 12:03:00 ┆ 1989-12-04 13:14:00 ┆ 1    ┆ 1    ┆ 0    │
    │ 2          ┆ 1989-12-04 13:14:00 ┆ 1989-12-04 13:14:00 ┆ 1989-12-10 03:07:00 ┆ 2    ┆ 2    ┆ 1    │
    │ 2          ┆ 1989-12-06 15:17:00 ┆ 1989-12-06 15:17:00 ┆ 1989-12-10 03:07:00 ┆ 1    ┆ 2    ┆ 1    │
    │ 2          ┆ 1989-12-08 16:22:00 ┆ 1989-12-08 16:22:00 ┆ 1989-12-10 03:07:00 ┆ 0    ┆ 1    ┆ 0    │
    │ 2          ┆ 1989-12-10 03:07:00 ┆ null                ┆ null                ┆ 0    ┆ 0    ┆ 0    │
    └────────────┴─────────────────────┴─────────────────────┴─────────────────────┴──────┴──────┴──────┘
    >>> boolean_expr_bound_sum(
    ...     df,
    ...     pl.col("idx").is_in([1, 4, 7]),
    ...     "row_to_bound",
    ...     "right",
    ... ).drop("idx")
    shape: (8, 7)
    ┌────────────┬─────────────────────┬─────────────────────┬─────────────────────┬──────┬──────┬──────┐
    │ subject_id ┆ timestamp           ┆ timestamp_at_start  ┆ timestamp_at_end    ┆ is_A ┆ is_B ┆ is_C │
    │ ---        ┆ ---                 ┆ ---                 ┆ ---                 ┆ ---  ┆ ---  ┆ ---  │
    │ i64        ┆ datetime[μs]        ┆ datetime[μs]        ┆ datetime[μs]        ┆ i64  ┆ i64  ┆ i64  │
    ╞════════════╪═════════════════════╪═════════════════════╪═════════════════════╪══════╪══════╪══════╡
    │ 1          ┆ 1989-12-01 12:03:00 ┆ 1989-12-01 12:03:00 ┆ 1989-12-03 13:14:00 ┆ 0    ┆ 1    ┆ 1    │
    │ 1          ┆ 1989-12-03 13:14:00 ┆ 1989-12-03 13:14:00 ┆ 1989-12-03 13:14:00 ┆ 0    ┆ 0    ┆ 0    │
    │ 1          ┆ 1989-12-05 15:17:00 ┆ null                ┆ null                ┆ 0    ┆ 0    ┆ 0    │
    │ 2          ┆ 1989-12-02 12:03:00 ┆ 1989-12-02 12:03:00 ┆ 1989-12-04 13:14:00 ┆ 1    ┆ 0    ┆ 0    │
    │ 2          ┆ 1989-12-04 13:14:00 ┆ 1989-12-04 13:14:00 ┆ 1989-12-04 13:14:00 ┆ 0    ┆ 0    ┆ 0    │
    │ 2          ┆ 1989-12-06 15:17:00 ┆ 1989-12-06 15:17:00 ┆ 1989-12-10 03:07:00 ┆ 0    ┆ 2    ┆ 1    │
    │ 2          ┆ 1989-12-08 16:22:00 ┆ 1989-12-08 16:22:00 ┆ 1989-12-10 03:07:00 ┆ 0    ┆ 1    ┆ 1    │
    │ 2          ┆ 1989-12-10 03:07:00 ┆ 1989-12-10 03:07:00 ┆ 1989-12-10 03:07:00 ┆ 0    ┆ 0    ┆ 0    │
    └────────────┴─────────────────────┴─────────────────────┴─────────────────────┴──────┴──────┴──────┘
    >>> #### WITH OFFSET ####
    >>> boolean_expr_bound_sum(
    ...     df,
    ...     pl.col("idx").is_in([1, 4, 7]),
    ...     "bound_to_row",
    ...     "both",
    ...     offset = timedelta(days=3),
    ... ).drop("idx")
    shape: (8, 7)
    ┌────────────┬─────────────────────┬─────────────────────┬─────────────────────┬──────┬──────┬──────┐
    │ subject_id ┆ timestamp           ┆ timestamp_at_start  ┆ timestamp_at_end    ┆ is_A ┆ is_B ┆ is_C │
    │ ---        ┆ ---                 ┆ ---                 ┆ ---                 ┆ ---  ┆ ---  ┆ ---  │
    │ i64        ┆ datetime[μs]        ┆ datetime[μs]        ┆ datetime[μs]        ┆ i64  ┆ i64  ┆ i64  │
    ╞════════════╪═════════════════════╪═════════════════════╪═════════════════════╪══════╪══════╪══════╡
    │ 1          ┆ 1989-12-01 12:03:00 ┆ 1989-12-03 13:14:00 ┆ 1989-12-04 12:03:00 ┆ 0    ┆ 1    ┆ 1    │
    │ 1          ┆ 1989-12-03 13:14:00 ┆ 1989-12-03 13:14:00 ┆ 1989-12-06 13:14:00 ┆ 1    ┆ 1    ┆ 1    │
    │ 1          ┆ 1989-12-05 15:17:00 ┆ 1989-12-03 13:14:00 ┆ 1989-12-08 15:17:00 ┆ 1    ┆ 1    ┆ 1    │
    │ 2          ┆ 1989-12-02 12:03:00 ┆ 1989-12-04 13:14:00 ┆ 1989-12-05 12:03:00 ┆ 1    ┆ 0    ┆ 0    │
    │ 2          ┆ 1989-12-04 13:14:00 ┆ 1989-12-04 13:14:00 ┆ 1989-12-07 13:14:00 ┆ 2    ┆ 1    ┆ 1    │
    │ 2          ┆ 1989-12-06 15:17:00 ┆ 1989-12-04 13:14:00 ┆ 1989-12-09 15:17:00 ┆ 2    ┆ 2    ┆ 1    │
    │ 2          ┆ 1989-12-08 16:22:00 ┆ 1989-12-10 03:07:00 ┆ 1989-12-11 16:22:00 ┆ 0    ┆ 1    ┆ 1    │
    │ 2          ┆ 1989-12-10 03:07:00 ┆ 1989-12-10 03:07:00 ┆ 1989-12-13 03:07:00 ┆ 0    ┆ 1    ┆ 1    │
    └────────────┴─────────────────────┴─────────────────────┴─────────────────────┴──────┴──────┴──────┘
    >>> boolean_expr_bound_sum(
    ...     df,
    ...     pl.col("idx").is_in([1, 4, 7]),
    ...     "bound_to_row",
    ...     "left",
    ...     offset = timedelta(days=3),
    ... ).drop("idx")
    shape: (8, 7)
    ┌────────────┬─────────────────────┬─────────────────────┬─────────────────────┬──────┬──────┬──────┐
    │ subject_id ┆ timestamp           ┆ timestamp_at_start  ┆ timestamp_at_end    ┆ is_A ┆ is_B ┆ is_C │
    │ ---        ┆ ---                 ┆ ---                 ┆ ---                 ┆ ---  ┆ ---  ┆ ---  │
    │ i64        ┆ datetime[μs]        ┆ datetime[μs]        ┆ datetime[μs]        ┆ i64  ┆ i64  ┆ i64  │
    ╞════════════╪═════════════════════╪═════════════════════╪═════════════════════╪══════╪══════╪══════╡
    │ 1          ┆ 1989-12-01 12:03:00 ┆ 1989-12-03 13:14:00 ┆ 1989-12-04 12:03:00 ┆ 0    ┆ 1    ┆ 1    │
    │ 1          ┆ 1989-12-03 13:14:00 ┆ 1989-12-03 13:14:00 ┆ 1989-12-06 13:14:00 ┆ 1    ┆ 1    ┆ 1    │
    │ 1          ┆ 1989-12-05 15:17:00 ┆ 1989-12-03 13:14:00 ┆ 1989-12-08 15:17:00 ┆ 1    ┆ 1    ┆ 1    │
    │ 2          ┆ 1989-12-02 12:03:00 ┆ 1989-12-04 13:14:00 ┆ 1989-12-05 12:03:00 ┆ 1    ┆ 0    ┆ 0    │
    │ 2          ┆ 1989-12-04 13:14:00 ┆ 1989-12-04 13:14:00 ┆ 1989-12-07 13:14:00 ┆ 2    ┆ 1    ┆ 1    │
    │ 2          ┆ 1989-12-06 15:17:00 ┆ 1989-12-04 13:14:00 ┆ 1989-12-09 15:17:00 ┆ 2    ┆ 2    ┆ 1    │
    │ 2          ┆ 1989-12-08 16:22:00 ┆ 1989-12-10 03:07:00 ┆ 1989-12-11 16:22:00 ┆ 0    ┆ 1    ┆ 1    │
    │ 2          ┆ 1989-12-10 03:07:00 ┆ 1989-12-10 03:07:00 ┆ 1989-12-13 03:07:00 ┆ 0    ┆ 1    ┆ 1    │
    └────────────┴─────────────────────┴─────────────────────┴─────────────────────┴──────┴──────┴──────┘
    >>> boolean_expr_bound_sum(
    ...     df,
    ...     pl.col("idx").is_in([1, 4, 7]),
    ...     "bound_to_row",
    ...     "none",
    ...     timedelta(days=-3),
    ... ).drop("idx")
    shape: (8, 7)
    ┌────────────┬─────────────────────┬─────────────────────┬─────────────────────┬──────┬──────┬──────┐
    │ subject_id ┆ timestamp           ┆ timestamp_at_start  ┆ timestamp_at_end    ┆ is_A ┆ is_B ┆ is_C │
    │ ---        ┆ ---                 ┆ ---                 ┆ ---                 ┆ ---  ┆ ---  ┆ ---  │
    │ i64        ┆ datetime[μs]        ┆ datetime[μs]        ┆ datetime[μs]        ┆ i64  ┆ i64  ┆ i64  │
    ╞════════════╪═════════════════════╪═════════════════════╪═════════════════════╪══════╪══════╪══════╡
    │ 1          ┆ 1989-12-01 12:03:00 ┆ null                ┆ null                ┆ 0    ┆ 0    ┆ 0    │
    │ 1          ┆ 1989-12-03 13:14:00 ┆ null                ┆ null                ┆ 0    ┆ 0    ┆ 0    │
    │ 1          ┆ 1989-12-05 15:17:00 ┆ null                ┆ null                ┆ 0    ┆ 0    ┆ 0    │
    │ 2          ┆ 1989-12-02 12:03:00 ┆ null                ┆ null                ┆ 0    ┆ 0    ┆ 0    │
    │ 2          ┆ 1989-12-04 13:14:00 ┆ null                ┆ null                ┆ 0    ┆ 0    ┆ 0    │
    │ 2          ┆ 1989-12-06 15:17:00 ┆ null                ┆ null                ┆ 0    ┆ 0    ┆ 0    │
    │ 2          ┆ 1989-12-08 16:22:00 ┆ 1989-12-04 13:14:00 ┆ 1989-12-05 16:22:00 ┆ 0    ┆ 0    ┆ 0    │
    │ 2          ┆ 1989-12-10 03:07:00 ┆ 1989-12-04 13:14:00 ┆ 1989-12-07 03:07:00 ┆ 1    ┆ 1    ┆ 1    │
    └────────────┴─────────────────────┴─────────────────────┴─────────────────────┴──────┴──────┴──────┘
    >>> boolean_expr_bound_sum(
    ...     df,
    ...     pl.col("idx").is_in([1, 4, 7]),
    ...     "bound_to_row",
    ...     "right",
    ...     offset = timedelta(days=-3),
    ... ).drop("idx")
    shape: (8, 7)
    ┌────────────┬─────────────────────┬─────────────────────┬─────────────────────┬──────┬──────┬──────┐
    │ subject_id ┆ timestamp           ┆ timestamp_at_start  ┆ timestamp_at_end    ┆ is_A ┆ is_B ┆ is_C │
    │ ---        ┆ ---                 ┆ ---                 ┆ ---                 ┆ ---  ┆ ---  ┆ ---  │
    │ i64        ┆ datetime[μs]        ┆ datetime[μs]        ┆ datetime[μs]        ┆ i64  ┆ i64  ┆ i64  │
    ╞════════════╪═════════════════════╪═════════════════════╪═════════════════════╪══════╪══════╪══════╡
    │ 1          ┆ 1989-12-01 12:03:00 ┆ null                ┆ null                ┆ 0    ┆ 0    ┆ 0    │
    │ 1          ┆ 1989-12-03 13:14:00 ┆ null                ┆ null                ┆ 0    ┆ 0    ┆ 0    │
    │ 1          ┆ 1989-12-05 15:17:00 ┆ null                ┆ null                ┆ 0    ┆ 0    ┆ 0    │
    │ 2          ┆ 1989-12-02 12:03:00 ┆ null                ┆ null                ┆ 0    ┆ 0    ┆ 0    │
    │ 2          ┆ 1989-12-04 13:14:00 ┆ null                ┆ null                ┆ 0    ┆ 0    ┆ 0    │
    │ 2          ┆ 1989-12-06 15:17:00 ┆ null                ┆ null                ┆ 0    ┆ 0    ┆ 0    │
    │ 2          ┆ 1989-12-08 16:22:00 ┆ 1989-12-04 13:14:00 ┆ 1989-12-05 16:22:00 ┆ 0    ┆ 0    ┆ 0    │
    │ 2          ┆ 1989-12-10 03:07:00 ┆ 1989-12-04 13:14:00 ┆ 1989-12-07 03:07:00 ┆ 1    ┆ 1    ┆ 1    │
    └────────────┴─────────────────────┴─────────────────────┴─────────────────────┴──────┴──────┴──────┘
    >>> boolean_expr_bound_sum(
    ...     df,
    ...     pl.col("idx").is_in([1, 4, 7]),
    ...     "row_to_bound",
    ...     "both",
    ...     offset = timedelta(days=3),
    ... ).drop("idx")
    shape: (8, 7)
    ┌────────────┬─────────────────────┬─────────────────────┬─────────────────────┬──────┬──────┬──────┐
    │ subject_id ┆ timestamp           ┆ timestamp_at_start  ┆ timestamp_at_end    ┆ is_A ┆ is_B ┆ is_C │
    │ ---        ┆ ---                 ┆ ---                 ┆ ---                 ┆ ---  ┆ ---  ┆ ---  │
    │ i64        ┆ datetime[μs]        ┆ datetime[μs]        ┆ datetime[μs]        ┆ i64  ┆ i64  ┆ i64  │
    ╞════════════╪═════════════════════╪═════════════════════╪═════════════════════╪══════╪══════╪══════╡
    │ 1          ┆ 1989-12-01 12:03:00 ┆ null                ┆ null                ┆ 0    ┆ 0    ┆ 0    │
    │ 1          ┆ 1989-12-03 13:14:00 ┆ null                ┆ null                ┆ 0    ┆ 0    ┆ 0    │
    │ 1          ┆ 1989-12-05 15:17:00 ┆ null                ┆ null                ┆ 0    ┆ 0    ┆ 0    │
    │ 2          ┆ 1989-12-02 12:03:00 ┆ 1989-12-05 12:03:00 ┆ 1989-12-10 03:07:00 ┆ 1    ┆ 3    ┆ 2    │
    │ 2          ┆ 1989-12-04 13:14:00 ┆ 1989-12-07 13:14:00 ┆ 1989-12-10 03:07:00 ┆ 0    ┆ 2    ┆ 1    │
    │ 2          ┆ 1989-12-06 15:17:00 ┆ 1989-12-09 15:17:00 ┆ 1989-12-10 03:07:00 ┆ 0    ┆ 1    ┆ 1    │
    │ 2          ┆ 1989-12-08 16:22:00 ┆ null                ┆ null                ┆ 0    ┆ 0    ┆ 0    │
    │ 2          ┆ 1989-12-10 03:07:00 ┆ null                ┆ null                ┆ 0    ┆ 0    ┆ 0    │
    └────────────┴─────────────────────┴─────────────────────┴─────────────────────┴──────┴──────┴──────┘
    >>> boolean_expr_bound_sum(
    ...     df,
    ...     pl.col("idx").is_in([1, 4, 7]),
    ...     "row_to_bound",
    ...     "left",
    ...     offset = timedelta(days=3),
    ... ).drop("idx")
    shape: (8, 7)
    ┌────────────┬─────────────────────┬─────────────────────┬─────────────────────┬──────┬──────┬──────┐
    │ subject_id ┆ timestamp           ┆ timestamp_at_start  ┆ timestamp_at_end    ┆ is_A ┆ is_B ┆ is_C │
    │ ---        ┆ ---                 ┆ ---                 ┆ ---                 ┆ ---  ┆ ---  ┆ ---  │
    │ i64        ┆ datetime[μs]        ┆ datetime[μs]        ┆ datetime[μs]        ┆ i64  ┆ i64  ┆ i64  │
    ╞════════════╪═════════════════════╪═════════════════════╪═════════════════════╪══════╪══════╪══════╡
    │ 1          ┆ 1989-12-01 12:03:00 ┆ null                ┆ null                ┆ 0    ┆ 0    ┆ 0    │
    │ 1          ┆ 1989-12-03 13:14:00 ┆ null                ┆ null                ┆ 0    ┆ 0    ┆ 0    │
    │ 1          ┆ 1989-12-05 15:17:00 ┆ null                ┆ null                ┆ 0    ┆ 0    ┆ 0    │
    │ 2          ┆ 1989-12-02 12:03:00 ┆ 1989-12-05 12:03:00 ┆ 1989-12-10 03:07:00 ┆ 1    ┆ 2    ┆ 1    │
    │ 2          ┆ 1989-12-04 13:14:00 ┆ 1989-12-07 13:14:00 ┆ 1989-12-10 03:07:00 ┆ 0    ┆ 1    ┆ 0    │
    │ 2          ┆ 1989-12-06 15:17:00 ┆ 1989-12-09 15:17:00 ┆ 1989-12-10 03:07:00 ┆ 0    ┆ 0    ┆ 0    │
    │ 2          ┆ 1989-12-08 16:22:00 ┆ null                ┆ null                ┆ 0    ┆ 0    ┆ 0    │
    │ 2          ┆ 1989-12-10 03:07:00 ┆ null                ┆ null                ┆ 0    ┆ 0    ┆ 0    │
    └────────────┴─────────────────────┴─────────────────────┴─────────────────────┴──────┴──────┴──────┘
    >>> boolean_expr_bound_sum(
    ...     df,
    ...     pl.col("idx").is_in([1, 4, 7]),
    ...     "row_to_bound",
    ...     "none",
    ...     offset = timedelta(days=-3),
    ... ).drop("idx")
    shape: (8, 7)
    ┌────────────┬─────────────────────┬─────────────────────┬─────────────────────┬──────┬──────┬──────┐
    │ subject_id ┆ timestamp           ┆ timestamp_at_start  ┆ timestamp_at_end    ┆ is_A ┆ is_B ┆ is_C │
    │ ---        ┆ ---                 ┆ ---                 ┆ ---                 ┆ ---  ┆ ---  ┆ ---  │
    │ i64        ┆ datetime[μs]        ┆ datetime[μs]        ┆ datetime[μs]        ┆ i64  ┆ i64  ┆ i64  │
    ╞════════════╪═════════════════════╪═════════════════════╪═════════════════════╪══════╪══════╪══════╡
    │ 1          ┆ 1989-12-01 12:03:00 ┆ 1989-11-28 12:03:00 ┆ 1989-12-03 13:14:00 ┆ 1    ┆ 0    ┆ 0    │
    │ 1          ┆ 1989-12-03 13:14:00 ┆ 1989-11-30 13:14:00 ┆ 1989-12-03 13:14:00 ┆ 1    ┆ 0    ┆ 0    │
    │ 1          ┆ 1989-12-05 15:17:00 ┆ 1989-12-02 15:17:00 ┆ 1989-12-03 13:14:00 ┆ 0    ┆ 0    ┆ 0    │
    │ 2          ┆ 1989-12-02 12:03:00 ┆ 1989-11-29 12:03:00 ┆ 1989-12-04 13:14:00 ┆ 1    ┆ 1    ┆ 0    │
    │ 2          ┆ 1989-12-04 13:14:00 ┆ 1989-12-01 13:14:00 ┆ 1989-12-04 13:14:00 ┆ 1    ┆ 1    ┆ 0    │
    │ 2          ┆ 1989-12-06 15:17:00 ┆ 1989-12-03 15:17:00 ┆ 1989-12-04 13:14:00 ┆ 0    ┆ 0    ┆ 0    │
    │ 2          ┆ 1989-12-08 16:22:00 ┆ 1989-12-05 16:22:00 ┆ 1989-12-10 03:07:00 ┆ 1    ┆ 2    ┆ 1    │
    │ 2          ┆ 1989-12-10 03:07:00 ┆ 1989-12-07 03:07:00 ┆ 1989-12-10 03:07:00 ┆ 0    ┆ 1    ┆ 0    │
    └────────────┴─────────────────────┴─────────────────────┴─────────────────────┴──────┴──────┴──────┘
    >>> boolean_expr_bound_sum(
    ...     df,
    ...     pl.col("idx").is_in([1, 4, 7]),
    ...     "row_to_bound",
    ...     "right",
    ...     offset = timedelta(days=-3),
    ... ).drop("idx")
    shape: (8, 7)
    ┌────────────┬─────────────────────┬─────────────────────┬─────────────────────┬──────┬──────┬──────┐
    │ subject_id ┆ timestamp           ┆ timestamp_at_start  ┆ timestamp_at_end    ┆ is_A ┆ is_B ┆ is_C │
    │ ---        ┆ ---                 ┆ ---                 ┆ ---                 ┆ ---  ┆ ---  ┆ ---  │
    │ i64        ┆ datetime[μs]        ┆ datetime[μs]        ┆ datetime[μs]        ┆ i64  ┆ i64  ┆ i64  │
    ╞════════════╪═════════════════════╪═════════════════════╪═════════════════════╪══════╪══════╪══════╡
    │ 1          ┆ 1989-12-01 12:03:00 ┆ 1989-11-28 12:03:00 ┆ 1989-12-03 13:14:00 ┆ 1    ┆ 1    ┆ 1    │
    │ 1          ┆ 1989-12-03 13:14:00 ┆ 1989-11-30 13:14:00 ┆ 1989-12-03 13:14:00 ┆ 1    ┆ 1    ┆ 1    │
    │ 1          ┆ 1989-12-05 15:17:00 ┆ 1989-12-02 15:17:00 ┆ 1989-12-03 13:14:00 ┆ 0    ┆ 1    ┆ 1    │
    │ 2          ┆ 1989-12-02 12:03:00 ┆ 1989-11-29 12:03:00 ┆ 1989-12-04 13:14:00 ┆ 2    ┆ 1    ┆ 0    │
    │ 2          ┆ 1989-12-04 13:14:00 ┆ 1989-12-01 13:14:00 ┆ 1989-12-04 13:14:00 ┆ 2    ┆ 1    ┆ 0    │
    │ 2          ┆ 1989-12-06 15:17:00 ┆ 1989-12-03 15:17:00 ┆ 1989-12-04 13:14:00 ┆ 1    ┆ 0    ┆ 0    │
    │ 2          ┆ 1989-12-08 16:22:00 ┆ 1989-12-05 16:22:00 ┆ 1989-12-10 03:07:00 ┆ 1    ┆ 3    ┆ 2    │
    │ 2          ┆ 1989-12-10 03:07:00 ┆ 1989-12-07 03:07:00 ┆ 1989-12-10 03:07:00 ┆ 0    ┆ 2    ┆ 1    │
    └────────────┴─────────────────────┴─────────────────────┴─────────────────────┴──────┴──────┴──────┘
"""
if mode not in ("bound_to_row", "row_to_bound"):
    raise ValueError(f"mode {mode} invalid!")
if closed not in ("both", "none", "left", "right"):
    raise ValueError(f"closed {closed} invalid!")

if offset != timedelta(0):
    if offset > timedelta(0):
        left_inclusive = False
        if mode == "row_to_bound":
            # Here, we'll be taking cumsum_at_bound - cumsum_at_row - aggd_over_offset
            if closed in ("left", "both"):
                right_inclusive = False
            else:
                right_inclusive = True
        else:
            # Here, we'll be taking cumsum_at_row - cumsum_at_bound + aggd_over_offset
            if closed in ("right", "both"):
                right_inclusive = True
            else:
                right_inclusive = False
    else:
        right_inclusive = False
        if mode == "row_to_bound":
            # Here, we'll be taking cumsum_at_bound - cumsum_at_row + aggd_over_offset
            if closed in ("left", "both"):
                left_inclusive = True
            else:
                left_inclusive = False
        else:
            # Here, we'll be taking cumsum_at_row - cumsum_at_bound - aggd_over_offset
            if closed in ("right", "both"):
                left_inclusive = False
            else:
                left_inclusive = True

    aggd_over_offset = aggregate_temporal_window(
        df,
        TemporalWindowBounds(
            left_inclusive=left_inclusive,
            window_size=offset,
            right_inclusive=right_inclusive,
            offset=None,
        ),
    )

cols = [c for c in df.columns if c not in {"subject_id", "timestamp"}]

cumsum_cols = {c: pl.col(c).cum_sum().over("subject_id").alias(f"{c}_cumsum_at_row") for c in cols}
df = df.with_columns(*cumsum_cols.values())

cumsum_at_boundary = {c: pl.col(f"{c}_cumsum_at_row").alias(f"{c}_cumsum_at_boundary") for c in cols}

# We need to adjust `cumsum_at_boundary` to appropriately include or exclude the boundary event.
if mode == "bound_to_row" and closed in ("left", "both"):
    cumsum_at_boundary = {
        c: (expr - pl.col(c)).alias(f"{c}_cumsum_at_boundary") for c, expr in cumsum_at_boundary.items()
    }
elif mode == "row_to_bound" and closed not in ("right", "both"):
    cumsum_at_boundary = {
        c: (expr - pl.col(c)).alias(f"{c}_cumsum_at_boundary") for c, expr in cumsum_at_boundary.items()
    }

timestamp_offset = pl.col("timestamp") - offset
if mode == "bound_to_row":
    if closed in ("left", "both"):
        timestamp_offset -= timedelta(seconds=1e-6)
    else:
        timestamp_offset += timedelta(seconds=1e-6)

    fill_strategy = "forward"
    sum_exprs = {
        c: (
            pl.col(f"{c}_cumsum_at_row")
            - pl.col(f"{c}_cumsum_at_boundary").fill_null(strategy=fill_strategy).over("subject_id")
        ).alias(c)
        for c in cols
    }
    if (closed in ("left", "none") and offset <= timedelta(0)) or offset < timedelta(0):
        # If we either don't include the right endpoint due to the closed value and the lack of a positive
        # offset or we have a negative offset, we want to remove the right endpoint's counts from the
        # cumsum difference.
        sum_exprs = {c: expr - pl.col(c) for c, expr in sum_exprs.items()}
else:
    if closed in ("right", "both"):
        timestamp_offset += timedelta(seconds=1e-6)
    else:
        timestamp_offset -= timedelta(seconds=1e-6)

    fill_strategy = "backward"
    sum_exprs = {
        c: (
            pl.col(f"{c}_cumsum_at_boundary").fill_null(strategy=fill_strategy).over("subject_id")
            - pl.col(f"{c}_cumsum_at_row")
        ).alias(c)
        for c in cols
    }
    if (closed in ("left", "both") and offset <= timedelta(0)) or offset < timedelta(0):
        # If we either do include the left endpoint due to the closed value and the lack of a positive
        # offset or we have a negative offset, we want to add in the left endpoint's (the row's) counts
        # As they will always be included.
        sum_exprs = {c: expr + pl.col(c) for c, expr in sum_exprs.items()}

at_boundary_df = df.filter(boundary_expr).select(
    "subject_id",
    pl.col("timestamp").alias("timestamp_at_boundary"),
    timestamp_offset.alias("timestamp"),
    *cumsum_at_boundary.values(),
    pl.lit(False).alias("is_real"),
)

with_at_boundary_events = (
    pl.concat([df.with_columns(pl.lit(True).alias("is_real")), at_boundary_df], how="diagonal")
    .sort(by=["subject_id", "timestamp"])
    .select(
        "subject_id",
        "timestamp",
        pl.col("timestamp_at_boundary").fill_null(strategy=fill_strategy).over("subject_id"),
        *sum_exprs.values(),
        "is_real",
    )
    .filter("is_real")
    .drop("is_real")
)

if mode == "bound_to_row":
    st_timestamp_expr = pl.col("timestamp_at_boundary")
    end_timestamp_expr = pl.when(pl.col("timestamp_at_boundary").is_not_null()).then(
        pl.col("timestamp") + offset
    )
else:
    st_timestamp_expr = pl.when(pl.col("timestamp_at_boundary").is_not_null()).then(
        pl.col("timestamp") + offset
    )
    end_timestamp_expr = pl.col("timestamp_at_boundary")

if offset == timedelta(0):
    return with_at_boundary_events.select(
        "subject_id",
        "timestamp",
        st_timestamp_expr.alias("timestamp_at_start"),
        end_timestamp_expr.alias("timestamp_at_end"),
        *(pl.col(c).cast(PRED_CNT_TYPE).fill_null(0).alias(c) for c in cols),
    )

if mode == "bound_to_row" and offset > timedelta(0):

    def agg_offset_fn(c: str) -> pl.Expr:
        return pl.col(c) + pl.col(f"{c}_in_offset_period")

elif mode == "bound_to_row" and offset < timedelta(0):

    def agg_offset_fn(c: str) -> pl.Expr:
        return pl.col(c) - pl.col(f"{c}_in_offset_period")

elif mode == "row_to_bound" and offset > timedelta(0):

    def agg_offset_fn(c: str) -> pl.Expr:
        return pl.col(c) - pl.col(f"{c}_in_offset_period")

elif mode == "row_to_bound" and offset < timedelta(0):

    def agg_offset_fn(c: str) -> pl.Expr:
        return pl.col(c) + pl.col(f"{c}_in_offset_period")

else:
    raise ValueError(f"mode {mode} and offset {offset} invalid!")

return with_at_boundary_events.join(
    aggd_over_offset,
    on=["subject_id", "timestamp"],
    how="left",
    suffix="_in_offset_period",
).select(
    "subject_id",
    "timestamp",
    st_timestamp_expr.alias("timestamp_at_start"),
    end_timestamp_expr.alias("timestamp_at_end"),
    *(agg_offset_fn(c).cast(PRED_CNT_TYPE, strict=False).fill_null(0).alias(c) for c in cols),
)