mne-tools / mne-python

MNE: Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python
https://mne.tools
BSD 3-Clause "New" or "Revised" License
2.72k stars 1.32k forks source link

BUG: Scaling when reading with brainvision #5003

Closed sappelhoff closed 6 years ago

sappelhoff commented 6 years ago

When reading BrainVision Recorder files, the EEG values for my data seemed off by 1e6 decimal points. BV data is usually saved in microVolts, so at first I thought that perhaps MNE is only working in Volts and there was an implicit scaling - however, there is no documentation about this. Furthermore, the docstring of the mne.io.read_raw_brainvision function explicitly states the resulting unit to be microvolts.

I tried to find an error in the code, but it was too convoluted for me to get an idea. It could be here:

https://github.com/mne-tools/mne-python/blob/58e59df8aa99aab046eced1374c80eff290c4f2a/mne/io/brainvision/brainvision.py#L492

... but it could also be at the place where raw.load_data() is implemented ... ?

I have made a short example showing the problem (see example files in attached .zip folder: example_files.zip). It's weird that for my controlled example, I find the scaling to be 1e7 instead of 1e6 - but I cannot find the reason for that either ...

In any case, the point is made and help would be appreciated!

import numpy as np
import mne

# WRITING BVR DATA
# ----------------

# Note: The .vhdr and .vmrk files were edited manually to fit the testing purpose
# Settings
fname_eeg = 'test_unit_scaling.eeg'
fs = 1000  # Hz sampling freq
n_time = 1  # seconds of recording
n_timepts = int(n_time * fs)  # timepoints of simulated data
chans = ['Cz', 'Pz', 'Oz']  # channel names in .vhdr file
n_chans = len(chans)

# Simulate some data between -50 and 50 microVolts
data = np.random.randint(-50, 50, (n_chans, n_timepts))

# We need to save in a multiplexed format ... so reshape as follows:
# Preallocate the data
data_multiplexed = np.zeros(data.ravel().shape)

# Make indices into preallocated array for each channel
indices = {}
for chan_i in range(n_chans):
    indices[chan_i] = np.arange(chan_i, len(data_multiplexed), n_chans)

# Put channel data in their place
for chan_i in range(n_chans):
    data_multiplexed[indices[chan_i]] = data[chan_i, :]

# Make datatype to int16
data_multiplexed = np.asarray(data_multiplexed, dtype='int16')

# Write as binary file
data_multiplexed.tofile(fname_eeg)

# READING DATA WITH MNE
# ---------------------
# One "unscaled" version where scale = 1.
raw_noscale = mne.io.read_raw_brainvision('test_unit_scaling.vhdr')

# One scaled version ... scale = 1e7
raw_scale = mne.io.read_raw_brainvision('test_unit_scaling.vhdr', scale=1e7)

# COMPARE DATA
# ------------
mne_data_noscale = raw_noscale.get_data()[:n_chans,:]
mne_data_scale = raw_scale.get_data()[:n_chans, :]

# If the following does not raise an AssertionError, ...
# mne data is scaled after reading ... by factor 1e7
np.testing.assert_array_almost_equal(mne_data_noscale, data*1e-7)
np.testing.assert_array_almost_equal(mne_data_scale, data)
teonbrooks commented 6 years ago

hey @sappelhoff, the reader assumes that the data are in microvolts, so the default scale=1. the data are stored in Volts in the Raw object. within the ChannelInfo in .vhdr file, there's an associated unit for each channel. we take that info to determine what the proper scale. if that is info is missing, we use the scale as a failsafe that is user-defined.

teonbrooks commented 6 years ago

also, the reason it is 1e7 instead of 1e6 is because the resolution is 0.1uV (10e-1 * 10 e-6).

sappelhoff commented 6 years ago

Hi @teonbrooks, thanks for the quick reply and the hint as to why I found 1e-7 instead of the more sensible 1e-6 ...

What you are saying is that it is expected behavior to have the unit "Volts" in the Raw object - irrespective of the original data format. Could you point me to where this is documented either in a docstring or on the website? Perhaps I missed it.

In case I did not miss it, I think it would be helpful to add this information somewhere either in documentation or even as a key in Raw.info (though that would of course be more substantial)

agramfort commented 6 years ago

yes we store everything in international system units: EEG in V, mag in T, grad in T/m it simplifies things I think.

it could certainly be better documented in :

http://martinos.org/mne/stable/auto_tutorials/plot_creating_data_structures.html and/or http://martinos.org/mne/stable/manual/io.html

sappelhoff commented 6 years ago

@agramfort, thanks for the links. I see now what I probably missed: https://github.com/mne-tools/mne-python/blob/58e59df8aa99aab046eced1374c80eff290c4f2a/tutorials/plot_creating_data_structures.py#L80-L89

We could add a "Note" in a blue box on the Importing Data into MNE doc saying something like:

Note: Irrespective of the units used in your manufacturer's file format, MNE will only use the following units for any Raw object and make conversions accordingly:

  • V: eeg, eog, seeg, emg, ecg, bio, ecog
  • T: mag
    • T/m: grad
    • M: hbo, hbr
    • Am: dipole
    • AU: misc
agramfort commented 6 years ago

+1 PR welcome