Closed shawnrhoads closed 1 year ago
Based on the above, here are my potential edits to the function (marked with ### SAR EDIT ###
). I think the main issue is point 3 when there are multiple different sampling rates. (I also added a new save_path
flag). I could submit a PR, but I'm not very certain about some of these.
def make_mne(load_path=None, elec_data=None, format='edf', save_path=None):
"""
Make a mne object from the data and electrode files, and save out the photodiode.
Following this step, you can indicate bad electrodes manually.
Parameters
----------
load_path : str
path to the neural data
elec_data : pandas df
dataframe with all the electrode localization information
format : str
how was this data collected? options: ['edf', 'nlx]
Returns
-------
mne_data : mne object
mne object
"""
# 1) load the data:
if format=='edf':
edf_file = glob(f'{load_path}/*.edf')[0]
mne_data = mne.io.read_raw_edf(edf_file, preload=True)
elif format =='nlx':
### SAR EDIT ###
pattern = re.compile(r"_\d{4}\.ncs") # regex pattern to match "_0000.ncs" to "_9999.ncs"
ncs_files = [x for x in glob(f'{load_path}/*.ncs') if re.search(pattern, x)]
lfp = np.array([])
sr = []
ch_name = []
ch_type = 'seeg'
### SAR EDIT ###
for chan_path in ncs_files:
# This is leftover from Iowa - let's see how channels are named in MSSM data
chan_name = chan_path.split('/')[-1][:-4]
ch_num = int(chan_name[-4:]) ### SAR EDIT ### <-- need to confirm with Salman what this is doing
try:
fdata = nlx_utils.load_ncs(chan_path)
except IndexError:
print(f'No data in channel {chan_name}')
continue
### SAR EDIT ###
if lfp.size == 0:
lfp = fdata['data']
else:
lfp = np.vstack([lfp, fdata['data']])
sr.append(fdata['sampling_rate'])
ch_name.append(str(ch_num))
### SAR EDIT ###
unix_time = fdata['time']
### SAR NOTE: this fails if sr has more than two values
info = mne.create_info(ch_name, np.unique(sr), ch_type)
mne_data = mne.io.RawArray(lfp, info)
if format=='edf':
# The electrode names read out of the edf file do not always match those
# in the pdf (used for localization). This could be error on the side of the tech who input the labels,
# or on the side of MNE reading the labels in. Usually there's a mixup between lowercase 'l' and capital 'I'.
# Sometimes, there's electrodes on the pdf that are NOT in the MNE data structure... let's identify those as well.
new_mne_names, _, _ = match_elec_names(mne_data.ch_names, elec_data.label)
# Rename the mne data according to the localization data
new_name_dict = {x:y for (x,y) in zip(mne_data.ch_names, new_mne_names)}
mne_data.rename_channels(new_name_dict)
right_seeg_names = [i for i in mne_data.ch_names if i.startswith('r')]
left_seeg_names = [i for i in mne_data.ch_names if i.startswith('l')]
sEEG_mapping_dict = {f'{x}':'seeg' for x in left_seeg_names+right_seeg_names}
mne_data.set_channel_types(sEEG_mapping_dict)
# 3) Identify line noise
mne_data.info['line_freq'] = 60
# Notch out 60 Hz noise and harmonics
mne_data.notch_filter(freqs=(60, 120, 180, 240))
# 4) Save out the photodiode channel separately
### SAR EDIT ###
if save_path is None:
mne_data.save(f'{load_path}/photodiode.fif', picks='dc1', overwrite=True)
else:
mne_data.save(f'{save_path}/photodiode.fif', picks='dc1', overwrite=True)
### SAR EDIT ###
# 5) Clean up the MNE data
bads = detect_bad_elecs(mne_data, sEEG_mapping_dict)
mne_data.info['bads'] = bads
return mne_data
(Related to point 1): Some more troubleshooting now leads me to believe the ch_name
list should append the full file prefix, not the integer at the end, so maybe something more like this is better:
ch_name.append(chan_name[:-5])
Hey,
I finally dug into this. Basically, a lot of filetype ('.edf' vs. '.ncs') and recording site ('MSSM' vs. other) specification had to be added, with the added complication of dealing with cases where sampling rate differs across channels.
I've added a branch to this issue to work on this. The main change is in the make_mne function below. Try this out and let me know if it works - I tried it only with MS022 so far.
def make_mne(load_path=None, elec_data=None, format='edf', site='MSSM', overwrite=True, **kwargs):
"""
Make a mne object from the data and electrode files, and save out the photodiode.
Following this step, you can indicate bad electrodes manually.
Parameters
----------
load_path : str
path to the neural data
elec_data : pandas df
dataframe with all the electrode localization information
format : str
how was this data collected? options: ['edf', 'nlx]
site: str
where was the data collected? options: ['UI', 'MSSM'].
TODO: add site specificity for UC Davis
overwrite: bool
whether to overwrite existing data for this person if it exists
kwargs: dict
dictionary containing lists of different types of channel names
Returns
-------
mne_data : mne object
mne object
"""
# OPTIONAL: Set specific channel names that you might need:
eeg_names = kwargs['eeg_names']
resp_names = kwargs['resp_names']
ekg_names = kwargs['ekg_names']
photodiode_name = kwargs['photodiode_name']
seeg_names = kwargs['seeg_names']
if site == 'MSSM':
if not eeg_names: # If no input, assume the standard EEG montage at MSSM
eeg_names = ['fp1', 'f7', 't3', 't5', 'o1', 'f3', 'c3', 'p3', 'fp2', 'f8', 't4', 't6', 'o2', 'f4', 'c4', 'p4', 'fz', 'cz', 'pz']
# 1) load the data:
if format=='edf':
# MAKE SURE ALL THE EDF CHANNELS HAVE THE SAME SR! See: https://github.com/mne-tools/mne-python/issues/10635
if not photodiode_name:
photodiode_name = 'dc1'
# EDF data always comes from MSSM AFAIK. Modify this if that changes.
# This is a big block of data. Have to load first, then split out the sEEG and photodiode downstream.
edf_file = glob(f'{load_path}/*.edf')[0]
mne_data = mne.io.read_raw_edf(edf_file, preload=True)
# The electrode names read out of the edf file do not always match those
# in the pdf (used for localization). This could be error on the side of the tech who input the labels,
# or on the side of MNE reading the labels in. Usually there's a mixup between lowercase 'l' and capital 'I'.
# Sometimes, there's electrodes on the pdf that are NOT in the MNE data structure... let's identify those as well.
new_mne_names, _, _ = match_elec_names(mne_data.ch_names, elec_data.label)
# Rename the mne data according to the localization data
new_name_dict = {x:y for (x,y) in zip(mne_data.ch_names, new_mne_names)}
mne_data.rename_channels(new_name_dict)
if not seeg_names:
seeg_names = [i for i in mne_data.ch_names if ((i.startswith('l')) | (i.startswith('r')))]
sEEG_mapping_dict = {f'{x}':'seeg' for x in seeg_names}
mne_data.set_channel_types(sEEG_mapping_dict)
mne_data.info['line_freq'] = 60
# Notch out 60 Hz noise and harmonics
mne_data.notch_filter(freqs=(60, 120, 180, 240))
# Save out the photodiode channel separately
mne_data.save(f'{load_path}/photodiode.fif', picks=photodiode_name, overwrite=overwrite)
# drop EEG and EKG channels
drop_chans = list(set(mne_data.ch_names)^set(seeg_names))
mne_data.drop_channels(drop_chans)
bads = detect_bad_elecs(mne_data, sEEG_mapping_dict)
mne_data.info['bads'] = bads
mne_data.save(f'{load_path}/lfp_data.fif', picks=seeg_names, overwrite=overwrite)
elif format =='nlx':
# This is a pre-split data. Have to specifically load the sEEG and photodiode individually.
signals = []
srs = []
ch_name = []
ch_type = []
if site == 'MSSM':
# per Shawn, MSSM data seems to sometime have a "_0000.ncs" to "_9999.ncs" appended to the end of real data
pattern = re.compile(r"_\d{4}\.ncs") # regex pattern to match "_0000.ncs" to "_9999.ncs"
ncs_files = [x for x in glob(f'{load_path}/*.ncs') if re.search(pattern, x)]
# just in case this changes in the future:
if len(ncs_files) == 0:
ncs_files = glob(f'{load_path}/*.ncs')
if not seeg_names:
seeg_names = [x.split('/')[-1].replace('.ncs','') for x in glob(f'{load_path}/[R,L]*.ncs')]
else:
if not seeg_names:
seeg_names = [x.split('/')[-1].replace('.ncs','').split('_')[0] for x in glob(f'{load_path}/[R,L]*.ncs') if re.search(pattern, x)]
elif site == 'UI':
# here, the filenames are not informative. We have to get subject-specific information from the experimenter
ncs_files = glob(f'{load_path}/LFP*.ncs')
if not seeg_names:
raise ValueError('no seeg channels specified')
else:
# standardize to lower
seeg_names = [x.lower() for x in seeg_names]
sEEG_mapping_dict = {f'{x}':'seeg' for x in seeg_names}
for chan_path in ncs_files:
chan_name = chan_path.split('/')[-1].replace('.ncs','')
# strip the file type off the end if needed
if '_' in chan_name:
chan_name = chan_name.split('_')[0]
try:
fdata = nlx_utils.load_ncs(chan_path)
except IndexError:
print(f'No data in channel {chan_name}')
continue
# surface eeg
if eeg_names:
eeg_names = [x.lower() for x in eeg_names]
if chan_name.lower() in eeg_names:
ch_type.append('eeg')
if resp_names:
resp_names = [x.lower() for x in resp_names]
if chan_name.lower() in resp_names:
ch_type.append('bio')
if ekg_names:
ekg_names = [x.lower() for x in ekg_names]
if chan_name.lower() in ekg_names:
ch_type.append('ecg')
if seeg_names:
if chan_name.lower() in seeg_names:
ch_type.append('seeg')
elif chan_name.lower()[0] == 'u':
# microwire data
ch_type.append('seeg')
signals.append(fdata['data'])
srs.append(fdata['sampling_rate'])
ch_name.append(chan_name)
if len(ch_type) < len(ch_name):
ch_type.append('misc')
print(f'Unidentified data type in {chan_name}')
if np.unique(srs).shape[0] == 1:
# all the sampling rates match:
info = mne.create_info(ch_name, np.unique(srs), ch_type)
mne_data = mne.io.RawArray(signals, info)
else:
## Now we have to account for differing sampling rates. This will only really happen in the case of data where ANALOGUE channels
## are recorded at a much higher sampling rate, or with micro channels. Find the lowest sample rate, and downsample everything to that.
## I generally don't like this but it should be OK. Make sure that you identify synchronization times AFTER downsampling the analogue channel, and not before:
## https://gist.github.com/larsoner/01642cb3789992fbca59
target_sr = np.min(srs)
mne_data_resampled = []
for sr in np.unique(srs):
ch_ix = np.where(srs==sr)[0].astype(int)
info = mne.create_info([x for ix, x in enumerate(ch_name) if ix in ch_ix], sr, [x for ix, x in enumerate(ch_type) if ix in ch_ix])
mne_data_temp = mne.io.RawArray([x for ix, x in enumerate(signals) if ix in ch_ix], info)
if sr != target_sr:
# resample down to one sample rate
mne_data_temp.resample(sfreq=target_sr, npad='auto', n_jobs=-1)
mne_data_resampled.append(mne_data_temp)
else:
mne_data = mne_data_temp
#Because of the resampling, the end timings might not match perfectly:https://github.com/mne-tools/mne-python/issues/8257
if mne_data_resampled[0].tmax > mne_data.tmax:
mne_data_resampled[0].crop(tmin=0, tmax=mne_data.tmax)
elif mne_data_resampled[0].tmax < mne_data.tmax:
mne_data.crop(tmin=0, tmax=mne_data_resampled[0].tmax)
mne_data.add_channels(mne_data_resampled)
mne_data.info['line_freq'] = 60
# Notch out 60 Hz noise and harmonics
mne_data.notch_filter(freqs=(60, 120, 180, 240))
new_name_dict = {x:x.replace(" ", "").lower() for x in mne_data.ch_names}
mne_data.rename_channels(new_name_dict)
# Save out the photodiode channel separately
if not photodiode_name:
raise ValueError('no photodiode channel specified')
else:
mne_data.save(f'{load_path}/photodiode.fif', picks=photodiode_name, overwrite=overwrite)
# Save out the respiration channels separately
if resp_names:
mne_data.save(f'{load_path}/respiration_data.fif', picks=resp_names, overwrite=overwrite)
# Save out the EEG channels separately
if eeg_names:
mne_data.save(f'{load_path}/scalp_eeg_data.fif', picks=eeg_names, overwrite=overwrite)
# Save out the EEG channels separately
if ekg_names:
mne_data.save(f'{load_path}/ekg_data.fif', picks=ekg_names, overwrite=overwrite)
drop_chans = list(set([x.lower() for x in mne_data.ch_names])^set(seeg_names))
mne_data.drop_channels(drop_chans)
bads = detect_bad_elecs(mne_data, sEEG_mapping_dict)
mne_data.info['bads'] = bads
mne_data.save(f'{load_path}/lfp_data.fif', picks=seeg_names, overwrite=overwrite)
return mne_data
I ran into a few issues trying to load the
.ncs
files from neuralynx into an mne object. Attaching my notes and potential solutions.1.
lfp
,sr
, andch_name
are not defined on line 652:https://github.com/seqasim/LFPAnalysis/blob/bacc1f801f03bf3747746d8421946f80d32bee8c/LFPAnalysis/lfp_preprocess_utils.py#L652-L654
Potential solution: add the following before the
nsc_files
loop, assuminglfp
is a numpy array and the rest are lists (which is whatmne.create_info()
andmne.io.RawArray()
require:Then lines 652-654 should be replaced with something like (but see point 3 below for issue with
lfp
when channels are different sizes):2. Line 642 globs the all
.ncs
files inload_path
but these might not match the pattern needed for thech_num
assignment:https://github.com/seqasim/LFPAnalysis/blob/bacc1f801f03bf3747746d8421946f80d32bee8c/LFPAnalysis/lfp_preprocess_utils.py#L646
Potential solution 1: If
ch_name
should be referring to files that contain the pattern_0000.nsc
to_9999.nsc
(assuming this because this value is being converted to an integer), then maybe add something like this in place of line 642:If the case, then would also need to change line 646 to
ch_num = int(chan_name[-4:])
Potential solution 2: If
ch_name
should be referring to the base name of each file (without the.nsc
extension), then addchan_name.replace(".ncs","")
and remove theint()
function3.
mne.create_info()
fails when there are multiple unique values insr
andch_type
is not defined:https://github.com/seqasim/LFPAnalysis/blob/bacc1f801f03bf3747746d8421946f80d32bee8c/LFPAnalysis/lfp_preprocess_utils.py#L656-L657
Should
ch_type='seeg'
here?Also, when there are multiple unique values in
sr
, appending tolfp
fails in the loop because axis dimensions must be consistent.