Add operation to extend event files based on other tsv files

@neuromechanist @VisLab

We want to add a remodeler operation that data to an events tsv file based on another tabular file.

Specifically this would be of interest for stimulus metadata that is available in a tabular file. For more details on how this might be organized as discussed here.

Based on the organization there are two types of extensions:

Column only extension
Row and column extension

For now I think this could be two operations, but with the different options we can also split it up further.

Column only extension

The column only extension would apply to still stimuli. Extension would happen from the stimuli.tsv file. If a value in the events.tsv stim_file column matches a value in the stimuli.tsv column, other column values in that row of the stimuli.tsv are copied to the events.tsv.

There are many ways we can make this more generic:

Specify an input tsv
Specify input column names, possibly different names between files
Specify a mapping between the column values in one file to the other so that column values do not necessarily have to match in the original files.
Specify which columns from the external file to copy to the events file

Column and row extension

The column and row extension would apply to stimuli with temporal extent. Extension would happen from an x_stimulus.tsv file where x would match the value in the stim_file column of the event file. The x_stimulus.tsv should contain an onset and duration column. The onset value of the matching row would be added to the onsets in the stimulus file. Then, the rows and columns from the stimulus file would be added to the events file. It would be ordered by onset. Columns values from the original matching rows might also need to be copied to the new events.

More generic:

Specify a column, column value to tsv file mapping
Specify an interval of the external tsv file (someone shows part of a movie, uses annotations provided from full)[^1]
Specify which columns from the stimulus.tsv to copy
Specify which columns from the event file pass on their value from the parent event to the newly added rows

[^1]: We can also say that in this case people should just crop the stimulus.tsv file appropriately. I like having one way of telling people how to do some things so that things stay consistent.

1) The underlying infrastructure for remapping column values (column extension) is already in place with the KeyMap class in the HedTools. Implementation requires a wrapper and the handling of the additional columns.

2) Column and row extension is really timeline merging and probably should be labeled as such. There are three types: a) Onsets and durations are given explicitly (may need to include ability to add an overall constant to align beginnings of the time lines) and whether to use the destination duration column to indicate durations. b) Same as a) but you want to explicitly put a new row in for the offset. c) Just onsets are given and it is a mapping of single points. We would probably need to indicate which column is the start-time column, which column is the end-time column, and which column is the duration column. This way it could work on Elan files and NWB files too in the future.

On Fri, Dec 15, 2023 at 3:10 AM Monique Denissen @.***> wrote:

@neuromechanist https://github.com/neuromechanist @VisLab https://github.com/VisLab

We want to add a remodeler operation that data to an events tsv file based on another tabular file.

Specifically this would be of interest for stimulus metadata that is available in a tabular file. For more details on how this might be organized as discussed here https://github.com/bids-standard/bids-specification/issues/153.

Based on the organization there are two types of extensions:

Column only extension

Row and column extension

For now I think this could be two operations, but with the different options we can also split it up further. Column only extension

The column only extension would apply to still stimuli. Extension would happen from the stimuli.tsv file. If a value in the events.tsv stim_file column matches a value in the stimuli.tsv column, other column values in that row of the stimuli.tsv are copied to the events.tsv.

There are many ways we can make this more generic:

Specify an input tsv

Specify input column names, possibly different names between files

Specify a mapping between the column values in one file to the other so that column values do not necessarily have to match in the original files.

Specify which columns from the external file to copy to the events file

Column and row extension

The column and row extension would apply to stimuli with temporal extent. Extension would happen from an x_stimulus.tsv file where x would match the value in the stim_file column of the event file. The x_stimulus.tsv should contain an onset and duration column. The onset value of the matching row would be added to the onsets in the stimulus file. Then, the rows and columns from the stimulus file would be added to the events file. It would be ordered by onset. Columns values from the original matching rows might also need to be copied to the new events.

More generic:

Specify a column, column value to tsv file mapping

Specify an interval of the external tsv file (someone shows part of a movie, uses annotations provided from full)1 <#m_-3355806480065327754_user-content-fn-1-97cc3184bd9f19db6ca95a9b2ba68c48>

Specify which columns from the stimulus.tsv to copy

Specify which columns from the event file pass on their value from the parent event to the newly added rows

Footnotes

1.

We can also say that in this case people should just crop the stimulus.tsv file appropriately. I like having one way of telling people how to do some things so that things stay consistent. ↩ <#m_-3355806480065327754_user-content-fnref-1-97cc3184bd9f19db6ca95a9b2ba68c48>

— Reply to this email directly, view it on GitHub https://github.com/hed-standard/hed-python/issues/810, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJCJOW5RWKL56VDQA53PYDYJQHWVAVCNFSM6AAAAABAWEGOV6VHI2DSMVQWIX3LMV43ASLTON2WKOZSGA2DGMRUGEYDKMI . You are receiving this because you were mentioned.Message ID: @.***>

hed-standard / hed-python

Add operation to extend event files based on other tsv files #810

Column only extension

Column and row extension