Method to split multi-run physio files using BIDS dataset or dicom directory

In order to minimize user input and to work around missing physio triggers, it would be nice to have a function that splits and labels physio files using scan information from either the converted BIDS dataset or the raw dicoms.

Related to #36.

Detailed Description

The method I'm proposing would compare complete information from the actual scan data to potentially incomplete information from a multi-run physio file. It would use the difference in time between each scan's onset and each physio trigger period's onset to determine which scan matches up with each trigger period. From there, the timing of the scans can be used to split the physio file, even in the case of missing trigger periods. Moreover, because the relative timing within each file is what matters, this could be used even when (1) scan timing is anonymized, (2) scanner and physio computer clocks are mismatched, or (3) the physio file lacks absolute timing.

Context / Motivation

In my experience, BioPac can sometimes randomly fail to register trigger signals from PsychoPy or E-Prime (or possibly the behavioral tasks simply fail to send the trigger signals properly). Either way, my own physio files can have either missing scan-associated trigger signals or signals that go on longer than the scan (e.g., when the task crashes or is stopped early). In these cases, the proposed method could be used instead of manually editing the physio files. Plus, it would require minimal information from the user.

Possible Implementation

The scans.tsv file in BIDS datasets tends to have an acq_time field with the date and time of the start of the scan down to the second. The json metadata file associated with each scan also tends to have an AcquisitionTime field with the time of the start of the scan down to the millisecond. Dicoms of course also generally have the same information, so they could be scraped in cases where the dataset isn't available. From these files, we can get the onsets of all of the scans w.r.t. one another.

We can also take a multi-run physio file and extract the relative onsets of each of the recorded scans. As long as there are a couple of successful trigger signals, and the onsets of those signals are accurate, then we can compare the timing of the physio trigger signals to the timing of the scans. Assuming that there is variability in the timing between scans (e.g., every scan doesn't start exactly X minutes after the last one started), we can determine the offset that should be applied to the scan times in order to get accurate physio periods.

Plus, if we use the scans.tsv file, we can just grab the actual filenames for the different scans when we split the physio!

See here for a very basic proof of concept.

NOTE: The only part I'm still struggling with is inferring the durations of the scans for splitting the physio files. If we restrict this function to functional data, we can just load up each scan's TR and number of volumes, and then multiply those to get the duration. If we use the dicom directory, it could be a bit more difficult.

physiopy / phys2bids