mne-tools / mne-python

MNE: Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python
https://mne.tools
BSD 3-Clause "New" or "Revised" License
2.7k stars 1.31k forks source link

ENH: make use of time-stamps optional in `mne.Annotations.to_data_frame()` #12274

Closed dengemann closed 10 months ago

dengemann commented 10 months ago

Describe the new feature or enhancement

Hi folks!

I find the .to_data_frame() method for annotations very practical but I realize that the default behavior of having onsets in timestamps is not always what is practical from a perspective of data analysis where you might want to keep selecting data based on times in seconds (float).

I was wondering how you would feel about including an enhancment that would make following snippet of code work:

annot = mne.Annotations(onset=[0, 10, 20], duration=[2] * 3, description='test')
annot_df = annot.to_data_frame(self, onset_as_time_stamp=False)
assert annot_df['onset'] == annot.onset

Describe your proposed implementation

I would make conversion to time-stamps optional.

Describe possible alternatives

I cannot think of another implementation. One can of course post-hoc edit the data frame but the point of a convenience method is to be convenient.

Additional context

This can be helpful when analyzing continuous data in a fashion that does not involve epoching, e. g., in richly annotated long-term EEG recordings.

agramfort commented 10 months ago

Would allowing to export both timestamps and indices help?

like pandas allows you with to_csv to include the dataframe index or not.

Message ID: @.***>

drammock commented 10 months ago

@dengemann would it work for you to borrow the API from Raw.to_data_frame here?

time_format : str | None
    Desired time format. If ``None``, no conversion is applied, and time values
    remain as float values in seconds. If ``'ms'``, time values will be rounded
    to the nearest millisecond and converted to integers. If ``'timedelta'``,
    time values will be converted to :class:`pandas.Timedelta` values. 
    If ``'datetime'``, time values will be converted to :class:`pandas.Timestamp`
    values, relative to ``raw.info['meas_date']`` and offset by ``raw.first_samp``.
    "Default is ``None``.

The default would need to be different (at least initially) to avoid breaking compatibility, but otherwise it seems sensible to offer the same options for both Raw and Annotations.

cbrnr commented 10 months ago

Why not export both as Alex suggested? It's just an additional column, no parameter needed IMO.

drammock commented 10 months ago

Why not export both as Alex suggested? It's just an additional column, no parameter needed IMO.

@dengemann is asking for time in seconds, @agramfort is suggesting index (which I take to mean sample number). They are not the same.

cbrnr commented 10 months ago

I thought sample number is the current behavior? You can use time in seconds as the index, but I'd use a plain old column.

dengemann commented 10 months ago

I think adding a columm is a good idea too. Also OK to reuse our old mixin API. My main point was to preserve the onset information in seconds as that is very handy.

On Thu, 7 Dec 2023 at 21:50, Clemens Brunner @.***> wrote:

I thought sample number is the current behavior? You can use time in seconds as the index, but I'd use a plain old column.

— Reply to this email directly, view it on GitHub https://github.com/mne-tools/mne-python/issues/12274#issuecomment-1846093316, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOR7CSZXBK3UQXPCN3FRGDYIITZHAVCNFSM6AAAAABALB4AT2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBWGA4TGMZRGY . You are receiving this because you were mentioned.Message ID: @.***>

drammock commented 10 months ago

I thought sample number is the current behavior?

Nope:

In [4]: annot_df
Out[4]: 
                onset  duration description
0 1970-01-01 00:00:00       2.0        test
1 1970-01-01 00:00:10       2.0        test
2 1970-01-01 00:00:20       2.0        test

You can use time in seconds as the index, but I'd use a plain old column.

I would also vote for putting the time data in a column (not the index) and letting the user set whatever index they want. Where we differ, I think, is that I prefer not to create multiple representations of time in multiple columns. I'd rather give the user one time column and let them pick which representation they want (like we do for Raw).

cbrnr commented 10 months ago

If me move the current index into a new column time and use a default integer range as the new index (which corresponds to sample numbers), we have both versions in one data frame and hence do not need a new parameter.

drammock commented 10 months ago

I don't understand your suggestion... The current index is 0,1,2 (sequential integers) not sample number of the onset. So what does it gain us if we "move the current index into a new column time"?

cbrnr commented 10 months ago

I misunderstood the initial problem. 👍 for your suggestion to add a time_format parameter then.

agramfort commented 10 months ago

I misunderstood the initial problem. 👍 for your suggestion to add a time_format parameter then.

+1 I also had not understood the initial issue