spatialtopology / spacetop-prep

code for transferring data and preprocessing
MIT License
0 stars 3 forks source link

[BUG] utils.preprocess.identify_boundary misses boundaries if physio starts or end mid-scan #50

Open Michael-Sun opened 8 months ago

Michael-Sun commented 8 months ago

Which module is this from?

physio utils.preprocess.identify_boundary

What is the issue?

If physio recording started or ended mid-scan, identify_boundary will miss these boundaries (because they don't shift from 1 to 0, or 0 to 1).

What was your expected behavior?

Correctly identify those boundaries. Perhaps log a report that this was detected.

How can we reproduce this?

Self-explanatory

Any additional context?

The solution is the following changes to identify_boundary:

def identify_boundary(df, binary_col): """ Function used to extract onsets of the beginning of an event ("start") and end of an event ("stop"). The function identifies transitions of events and saves both "start" and "stop" of an event.

Parameters
----------
df: pandas dataframe
    acquisition file loaded into pandas using nk.read_acqknowledge.
binary_col: str
    column name of df that contains binary_col
event_name: str
    dictionary key value name. Make sure to provide a unique event name across events.

Returns
-------
dict: dictionary
    contains onsets of the beginning of an event ("start") and end of an event ("stop")
"""
dict = {}

start = df[df[binary_col] > df[binary_col].shift(1)].index.values.tolist()
stop = df[df[binary_col] < df[binary_col].shift(1)].index.values.tolist()

## ADDITIONS BY MICHAEL SUN:
# Check if the first data point is part of an event
# Since there's a rolling average window of the samping rate (2000), the first non-NaN observation is at time 2000
df[binary_col]=df[binary_col].fillna(0) # Replace nans for cleaner execution of identify_boundary.
if df[binary_col].iloc[2000] > 0:
    # The recording started during an ongoing event.
    # Add the first index (0) to the start list
    start = [0] + start
    print('WARNING: The physio recording started mid-scan for the first run detected.')

# Check if the last data point is part of an event
if df[binary_col].iloc[-1] > 0:
    # The recording ended during an ongoing event.
    # Add the last index to the stop list
    stop = stop + [len(df) - 1]
    print('WARNING: The physio recording ended mid-scan for the last run detected.')

dict = {'start': start,
    'stop': stop}

return dict