Default units in raw.to_data_frame()

chapochn commented 1 week ago

Description of the problem

It is not a bug per se, but I feel it is a quite misleading default. Getting the data through "get_data" and "to_data_frame()" give different output values by default, because to_data_frame converts to a "standard" unit. I am not sure if it is intended? My suggestion would be to not have such behavior by default. Although maybe you want to_data_frame() to be coherent with plotting rather than "get_data". I would think it makes more sense to be coherent with "get_data". Thoughts?

Steps to reproduce

import numpy as np
import mne

n=1
X = np.ones((1, n))
chs = ['ch1']
info = mne.create_info(chs, 5, ch_types='eeg')
raw = mne.io.RawArray(X, info)

df = raw.to_data_frame(index='time')
data = raw.get_data()
print(df.values, data)

Link to data

No response

Expected results

I would have expected to get: [[1.]] [[1]]

Actual results

[[1000000.]] [[1.]]

I can get the same result for both by using: df = raw.to_data_frame(index='time', scalings=1)

Additional information

Core ├☑ mne 1.7.1 (latest release) ├☑ numpy 1.26.4 (OpenBLAS 0.3.27 with 12 threads) ├☑ scipy 1.14.0 └☑ matplotlib 3.8.4Backend MacOSX is interactive backend. Turning interactive mode on. (backend=MacOSX) Numerical (optional) ├☑ sklearn 1.5.1 ├☑ numba 0.60.0 ├☑ nibabel 5.2.1 ├☑ nilearn 0.10.4 ├☑ dipy 1.9.0 ├☑ openmeeg 2.5.11 ├☑ pandas 2.2.2 ├☑ h5io 0.2.3 ├☑ h5py 3.11.0 └☐ unavailable cupy Visualization (optional) ├☑ pyvista 0.43.10 (OpenGL 4.1 Metal - 88.1 via Apple M2 Pro) ├☑ pyvistaqt 0.11.1 objc[24995]: Class QT_ROOT_LEVEL_POOL__THESE_OBJECTS_WILL_BE_RELEASED_WHEN_QAPP_GOES_OUT_OF_SCOPE is implemented in both /Users/.../Programs/miniconda3/envs/mne/lib/libQt5Core.5.15.8.dylib (0x17f9e12e0) and /Users/.../Programs/miniconda3/envs/mne/lib/libQt6Core.6.7.2.dylib (0x3302bd558). One of the two will be used. Which one is undefined. objc[24995]: Class KeyValueObserver is implemented in both /Users/.../Programs/miniconda3/envs/mne/lib/libQt5Core.5.15.8.dylib (0x17f9e1308) and /Users/.../Programs/miniconda3/envs/mne/lib/libQt6Core.6.7.2.dylib (0x3302bd580). One of the two will be used. Which one is undefined. objc[24995]: Class RunLoopModeTracker is implemented in both /Users/.../Programs/miniconda3/envs/mne/lib/libQt5Core.5.15.8.dylib (0x17f9e1358) and /Users/.../Programs/miniconda3/envs/mne/lib/libQt6Core.6.7.2.dylib (0x3302bd5d0). One of the two will be used. Which one is undefined. ├☑ vtk 9.3.0 ├☑ qtpy 2.4.1 Process finished with exit code 139 (interrupted by signal 11:SIGSEGV)

(it crashes here)

agramfort commented 1 week ago

I think it was a very conscious decision to have to_data_frame return values as plotted and with numbers closer to 1...

but I agree it can be felt a bit too magic.

note that this has been like this for years and you're the first person to report that it's a problem. I would therefore not jump on the conclusion that this needs to be fixed.

Message ID: @.***>

hoechenberger commented 1 week ago

@chapochn Thanks for this suggestion!

I have to agree with @agramfort on this one, the purpose of the DataFrame export is different from numpy array extraction IMHO

@sappelhoff @cbrnr WDYT

cbrnr commented 1 week ago

I also think that this is not a bug but intended behavior. It is also documented (https://mne.tools/stable/generated/mne.io.Raw.html#mne.io.Raw.to_data_frame), but maybe we could highlight this behavior more clearly at the top of the to_data_frame docstring or in a Notes section (instead of just in the description of the scalings parameter)?

sappelhoff commented 1 week ago

I agree with Alex, Richard, and Clemens. This behavior is as intended and IMHO also helpful/useful. Highlighting this behavior more clearly could be a valuable addition to the documentation.

chapochn commented 1 week ago

An added documentation to clarify the intention of this function sounds great. Otherwise, it wasn't really clear to me that it is not just a more convenient get_data function (I mainly work with panda DataFrames for my own analysis)

cbrnr commented 1 week ago

@chapochn would you be interested in submitting a PR?

chapochn commented 1 week ago

I could write the intention, but TBH, I don't understand the intention... For me, it's just confusing to get another data than what is actually in the data... If it is needs to be plotted, plots can handle scaling the data, so I really don't see the necessity.

chapochn commented 1 week ago

The other confusing thing is that in scaling it says:

Scaling factor applied to the channels picked. If None, defaults to dict(eeg=1e6, mag=1e15, grad=1e13) — i.e., converts EEG to µV, magnetometers to fT, and gradiometers to fT/cm.

But it doesn't cover all the other types of data that are possible. I am using ecog, it's not clear what kind of scaling it does. What about ekg channels, misc, etc.

chapochn commented 1 week ago

I see these are all the units that are covered in Raw:

Proper units of measure: V: eeg, eog, seeg, dbs, emg, ecg, bio, ecog T: mag T/m: grad M: hbo, hbr Am: dipole AU: misc

So would be good to explicitly say what is the conversion for each unit. And if it is only based on the original unit, or also on the modality. I.e., are eeg, eog, seeg, dbs, emg, ecg, bio, ecog all the same conversion, or if it depends.

I believe these signals, although all in V, can have different orders of magnitude

larsoner commented 1 week ago

It's likely that the scaling values and related channel types pull from mne.defaults somewhere that we could document in our glossary if we don't already. They are probably used elsewhere anyway.

chapochn commented 1 week ago

Yes, it's here:

The set of channels considered “data channels” in MNE contains the following types (together with scale factors for plotting)

https://mne.tools/stable/documentation/glossary.html#term-data-channels and here:

The set of channels considered “non-data channels” in MNE contains the following types (together with scale factors for plotting):

https://mne.tools/stable/documentation/glossary.html#term-non-data-channels

Should probably then add "and for DataFrame export"

And it could be cross-linked to the places where these scalings are used.

mne-tools / mne-python