Open agramfort opened 2 years ago
yes we should support that! no strong feelings on defaults.
On Mon, 17 Jan 2022 at 19:20, Alexandre Gramfort @.***> wrote:
here is an example of script I just had to write for a collaborator:
from pathlib import Path import pandas as pd import mne
sample_dir = Path(mne.datasets.sample.data_path()) sample_fname = sample_dir / 'MEG' / 'sample' / 'sample_audvis_raw.fif'
raw = mne.io.read_raw_fif(sample_fname, preload=True) raw.crop(tmax=10)
df = raw.to_data_frame() df = df.set_index("time")
index = pd.date_range(start=raw.info['meas_date'], periods=len(df) + raw.first_samp, freq=f'{1e3 / raw.info["sfreq"]:0.6f}ms') df.index = index[raw.first_samp:]
what I have in mind is that we can do
raw.to_data_frame(time_format='date')
to get the time as datetime64. Also I wonder why time is not set as index by default but It's more a matter of taste
@hoechenberger https://github.com/hoechenberger @dengemann https://github.com/dengemann @drammock https://github.com/drammock what do you think?
— Reply to this email directly, view it on GitHub https://github.com/mne-tools/mne-python/issues/10213, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOR7CUE5HEMT4O62JNXG6LUWRMWTANCNFSM5MFKG6RA . You are receiving this because you were mentioned.Message ID: @.***>
a datetime index would certainly make sense.
This is already supported. Quoting the docstring of the time_format
:
If
'datetime'
, time values will be converted to :class:pandas.Timestamp
values, relative toraw.info['meas_date']
and offset byraw.first_samp
.
Setting time as index automatically is possible by passing index='time'
.
If I run your snippet up through raw.crop(tmax=10)
and then:
In [5]: raw.to_data_frame(time_format='datetime', index='time')
Out[5]:
channel MEG 0113 MEG 0112 MEG 0111 ... EEG 059 EEG 060 EOG 061
time ...
2002-12-03 19:01:53.676070829+00:00 96.435548 -48.217774 101.074222 ... 38.854217 65.839113 285.661012
2002-12-03 19:01:53.677735789+00:00 0.000000 -28.930664 63.171389 ... 40.751037 68.002565 283.699953
2002-12-03 19:01:53.679400749+00:00 0.000000 -9.643555 75.805667 ... 40.995788 68.177980 280.431520
2002-12-03 19:01:53.681065709+00:00 125.366213 19.287110 101.074222 ... 41.179352 68.587282 279.124147
2002-12-03 19:01:53.682730669+00:00 163.940432 0.000000 0.000000 ... 39.343719 67.242433 281.738893
... ... ... ... ... ... ... ...
2002-12-03 19:02:03.669161407+00:00 -19.287110 -38.574219 -176.879889 ... 44.299926 62.857057 265.396730
2002-12-03 19:02:03.670826367+00:00 -19.287110 -9.643555 -113.708500 ... 46.013183 64.552736 267.357790
2002-12-03 19:02:03.672491327+00:00 -28.930664 9.643555 25.268556 ... 50.418701 68.061036 273.240968
2002-12-03 19:02:03.674156288+00:00 -28.930664 9.643555 37.902833 ... 52.621460 69.405885 275.202028
2002-12-03 19:02:03.675821248+00:00 -77.148438 -9.643555 138.977056 ... 52.437896 69.522829 271.279909
[6007 rows x 376 columns]
🎉
hum indeed now the way it's done now leads to:
df.index.freq == None
what I suggested above keep the sample frequency as it gives:
In [33]: df.index.freq
Out[33]: <1664960 * Nanos>
what do you think @drammock ?
yeah, currently there is no freq
because it's implemented by converting times
to a timedelta, then adding that to the meas_date:
if you think having .freq
is important I don't object to changing the implementation
@agramfort I took a look at (something similar to) your implementation. The main problem is that your way of doing it necessarily risks rounding error when converting 1 / sfreq
to an integer number of nanoseconds (not a problem if sfreq is an integer a nice integer like 1000, but for sample dataset you see the issue).
For your snippet of 10s of data, the last sample time is off by 1488 nanoseconds:
_, times = raw[:]
main = to_timedelta(times + raw.first_time, unit='s') + raw.info['meas_date']
alternative = date_range(
start=raw.info['meas_date'] + to_timedelta(raw.first_time, unit='s'),
periods=len(times),
freq=f'{np.rint(1e9 / raw.info["sfreq"]).astype(int)}N')
diff = main[-1] - alternative[-1]
diff.isoformat()
# 'P0DT0H0M0.000001488S'
This means that for a 60 minutes recording the last sample is off by 0.53568 milliseconds (more than half a millisecond). To me that seems too much.
hum... I need to think... but i get your point.
Message ID: @.***>
here is an example of script I just had to write for a collaborator:
what I have in mind is that we can do
raw.to_data_frame(time_format='date')
to get the time as datetime64. Also I wonder why time is not set as index by default but It's more a matter of taste
@hoechenberger @dengemann @drammock what do you think?