skjerns / AutoSleepScorer

An open-source sleep stage classification Python package
GNU Affero General Public License v3.0
106 stars 22 forks source link

Training on another sleep database #9

Open PaulMuadDib opened 6 years ago

PaulMuadDib commented 6 years ago

Hi Simon, I am starting to train the CNN/RNN on other sleep databases, and the 1st one is the SHHS database, but I have a doubt about the way to set the channels that are used (channels) and I did this:

sleep.available_channels ['SaO2', 'H.R.', 'EEG(sec)', 'ECG', 'EMG', 'EOG(L)', 'EOG(R)', 'EEG', 'THOR RES', 'ABDO RES', 'POSITION', 'LIGHT', 'NEW AIR', 'OX stat'] channels = {'EEG':['EEG(sec)','EEG'], 'EMG':'EMG', 'EOG':['EOG(L)','EOG(R)']} # set channels that are used: EEG(sec) = C3/A2 & EEG = C4/A1

Is it the correct way to retain the two available EEG channels 'EEG(sec)' and 'EEG' ? I ask because the script/training is running, so it has recognized/infered the EEG channel(s), but it missed the EOG channels. Jeff

skjerns commented 6 years ago

I do not know the details of this dataset, but in principle, you can use whichever channels and channel combinations you want. I would not be surprised if it would also work with ECG (however be aware of the different sample frequency that needs to be interpolated).

PaulMuadDib commented 6 years ago

Thanks Simon. I was just wondering if the square brackets (a list) where syntaxically the correct way to retain several (available) channels like 'EEG(sec)' and 'EEG' (which presumably are extracted from a edf file header).

PaulMuadDib commented 6 years ago

I thought the scorer could be used with the two available EEG channels of the PSG (plus EOG & EMG, but data is shaped with a last dim of 3 in the SleepLoader), hence my question. (Resampling it to 100 Hz) Would it be possible to put the other available EEG channel in the 'EMG' field in the channel definition, for instance in this way : channels = { 'EEG' : 'EEG(SEC)', 'EMG' : 'EEG', 'EOG' : 'EOG(L)' } ? (in the SHHS, 'EEG(SEC)' and 'EEG' are C3/A2 and C4/A1).

skjerns commented 6 years ago

I don't quite understand what you mean?

The network can be trained with any number of channels, and any configuration of channels, but you will need to adapt the script a bit to adjust the input layer of the network. Once a network is trained, the channels are fixed and you can't input any other channels or combinations.

The already trained weights provided by this repository do necessarily require the existence of EEG/EMG and EOG.

balandongiv commented 6 years ago

If you would allow me to rephrase Paul question. Say we have different data-set with the following setting EEG channels: C3_A2 and C4_1A. These two channel were refered as EEG(sec) and EEG, respectively in the SHHS dataset EMG channel: EMG SUBMENTAL EOG channel: 'EOG HORIZONTAL'

How to set the variable channels as in run_sample

my best guess will

channels = {'EEG': ('EEG(sec)', 'EEG'), 'EMG': 'EMG SUBMENTAL', 'EOG': 'EOG HORIZONTAL'}

Update! However, by doing so, the program failed at def check_channels(self, header):

PaulMuadDib commented 6 years ago

Sorry for bringing confusion here.

I am indeed reusing the script run_sample.py, where one EEG channel (among the two available), one EMG and one EOG channels are used. My first question was indeed to know whether it is possible to use both EEG channels only modifying the channels definition, and I asked if the following (for 5 channels) was the correct way to do this:

channels = {'EEG':['EEG(sec)','EEG'], 'EMG':'EMG', 'EOG':['EOG(L)','EOG(R)']}

But doing this will not enable to recruit/use the two EEG channels (and the two EOG channels), it is only useful in the case where the name of a channel changes from one edf file to another. And in the SHHS database, this happens to be the case: the channel 'EEG(SEC)' is also named 'EEG2' depending on the files in the SHHS1 repository. In this case,

channels = {'EEG':['EEG(sec)','EEG2'], 'EMG':'EMG', 'EOG':'EOG(L)'}

...did the job: selecting either EEG(SEC) or EEG2 depending on the edf file. Recruiting more channels than the 3 used maybe requires to modify varied scripts (models.py, SleepLoader.py, etc...).

balandongiv commented 6 years ago

Pcode1 The modification should work and been tested

In run_sample, change channels to

  ` channels = {'EEG': ('EEG(SEC)', 'EEG'), 'EMG': 'EMG', 'EOG': 'EOG(L)'}  `

Under def check_channels change the following

                if isinstance(ch_pick, tuple):
                    s1 = bool(((set([x.upper() for x in ["%s" % x for x in (self.channels[ch])]])).intersection(set(channels))))
                    if s1 is False: raise ValueError(
                    'ERROR: Channel {} for {} not found in {}\navailable channels: {}'.format(self.channels[ch], ch,
                                                                                              filename, channels))
                    for i in ch_pick:
                        picks.append(channels.index(i.upper()))
                else:                           # Newly add: 
                    picks.append(channels.index(ch_pick.upper()))
balandongiv commented 6 years ago

SImon, may I know the following for-loop is for what.

Im unable to modify the following, because I do not know under what circumstances it will triger the for-loop. This syntax under def check_channels

            if type(ch_pick) is list:
                found = False
                for c in ch_pick:
                    if c in channels:
                        found = True
                        ch_pick = c
                        break
                if found == False:   raise ValueError(
                    'ERROR: Channel {} for {} not found in {}\navailable channels: {}'.format(self.channels[ch], ch,
                                                                                              filename,
PaulMuadDib commented 6 years ago

I think this for-loop is precisely the one which picks up 'EEG(SEC)' or 'EEG2' (when the 'EEG' dict. item is declared as a list of strings like in: channels = {'EEG':['EEG(sec)','EEG2'],.....}) from the available channels (from header), depending on the SHHS edf files...

balandongiv commented 6 years ago

Yes you are right. The for-loop was for class list. But with my proposed workaround, we can omit this for loop, since we declared multiple EEG channels as a set

PaulMuadDib commented 6 years ago

Or, if you want to retain the possibility to select ('EEG(SEC)' OR 'EEG2') AND 'EEG' like with: channels = {'EEG': (['EEG(SEC)','EEG2], 'EEG'), 'EMG': 'EMG', 'EOG': 'EOG(L)'} (useful for the SHHS database since you will find either 'EEG(SEC)' or 'EEG2' for the same C3-A2 channel in the edf files), maybe something like the following ? (no tested, there might be some other modifications to implement) `

def check_channels_ext(self, header):
    channels = [c.upper() for c in header.ch_names]
    filename = os.path.basename(header.filenames[0])
    labels = []
    picks  = []
    for ch in self.channels:
        ch_pick= []
        if not self.channels[ch]: continue
        ch_to_pick = self.channels[ch]
        if type(ch_to_pick) is tuple:               # if there are several channels to pick up
            for chs in ch_to_pick:
                if type(chs) is list:
                    for c in chs: 
                        if c in channels: 
                            ch_pick.append(c)
                            break
                elif type(chs) is str:
                    if chs in channels:
                        ch_pick.append(chs)
                else:
                    raise TypeError('ERROR: The channel {} type in the channels {} to select should be a list [ch1,ch2] to select ch1 or ch2, or a string'.format(ch,ch_to_pick))
            if not len(ch_pick)==len(ch_to_pick):
                print('WARNING: Some channels of {} in {} are missing'.format(self.channels[ch], ch))
        elif type(ch_to_pick) is list:              # if there is one channel to pick between several ones
            for chs in ch_to_pick: 
                if chs in channels: 
                    ch_pick = c
                    break
        else:                               # if there only one channel to pick up
            if ch_to_pick.upper() in channels:  ch_pick = ch_to_pick

        if len(ch_pick)==0:   raise ValueError('ERROR: None of the channels {} for {} are found in {}\navailable channels: {}'.format(self.channels[ch], ch, filename, channels))

        picks.append(channels.index(ch_pick.upper()))
        labels.append(ch)

`

PaulMuadDib commented 6 years ago

Btw, the best model for the 50 males between 38 and 58 yrs drawn from the 200 cases of SHH1 only reaches: fold 6: val acc/f1: 0.80947/0.73136, test acc/f1: 0.76639/0.68004

`

kernel_length = 100
output_filters= 500
...
model.add(Conv1D (kernel_size = (kernel_length), filters = output_filters, strides=10,...                 
# model.layers[0].set_weights([np.array(kernel).reshape(kernel_shape), bias])      # initialize input layer weights with a dict. of atoms (wavelets waveforms)
model.add(Conv1D (kernel_size = (10), filters = 300, strides=2,...
model.add(Conv1D (kernel_size = (5), filters = 200, strides=1,....
model.add(Dense (1200, activation='relu', kernel_initializer='he_normal',name='fc1'))   
model.add(Dense (1200, activation='relu', kernel_initializer='he_normal',name='fc2'))

`

balandongiv commented 6 years ago

Seem not the best performance you have there. Are you using the pre-trained parameter?.

skjerns commented 6 years ago

Congratulations on getting it work!

Regarding the accuracy: There are a multitude of things that can be held responsible here: Using a different architecture, different data set, not using all channels. Additionally, I found that each dataset has it's own hyperparameters that achieve optimal performance: Changing the learning rate, stronger/weaker regularization, using a different initialization, etc. Sometimes it turns out that just the random seed was at fault for not reaching a high performance. It's a gamble. If the scoring within the dataset is consistent (inter-rater reliability), it will certainly be possible to approximate this inter-rater score.

Unfortunately neural networks are still notoriously difficult to debug.

skjerns commented 6 years ago

Please be aware, that as far as I remember (correct me if I'm wrong), this data set cuts off all EEG data at 120uV,which is a large part of slow oscillations and arousals etc. For this reason I also did not use it for my study.

PaulMuadDib commented 6 years ago

I was not aware that the dynamical range of the EEG channels was limited to 120µV. Last week, I used Spinky to see whether it could produce useful features for AutoSleepScorer (to input in its rnn part, along with the last cnn layer signals) and did not see clipping in the EEG channels. But clipping should be easily detectable when we load the data, I will check this.

PaulMuadDib commented 6 years ago

Using a balanced training, or not, made a difference... would it make sense to you the more I select similar individuals (gender, age, bmi, %rem, etc...) in this database, the harder it is to train on the selected cohort & to produce good scoring stats ? I was really expecting the opposite.

skjerns commented 6 years ago

The higher your variance in training data, the higher your generalization will be. If you select only very similar individiuals both for testing and training, you should get high accuracies.

Be aware, that each scorer has a large bias (read this recent study). With a bit of bad luck the scorer of the data set you're using has a more extreme bias and generalization will not work very well, as we're basically learning to copy his scoring style.

PaulMuadDib commented 6 years ago

Yes Simon, thank you for this reference, the quality of the human scoring on which the automatic scorer is training maybe a main driver for this... but difficult to assess for the varied sleep databases (edfs, shhs, mass, etc) commonly used for machine learning ? Not being an expert in sleep research, I confess that in a first approximation, I would tend to look in the publications or in simulation how easily a given learning structure trains & reaches good training/testing accuracies on the varied sleep databases... but maybe this is more complex than that.

skjerns commented 6 years ago

The biggest problem in my opinion is, that there is no Gold standard and that current sleep staging is just a sub-optimal practice kept alive since the 60s. Probability scores calculated from several raters is the closest we can get to a gold standard, but that's extremely expensive.

PaulMuadDib commented 6 years ago

The EDFx sleep database is the one scored with the largest number (4 of them) of human experts ?

skjerns commented 6 years ago

No, see here for details: https://physionet.org/pn4/sleep-edfx/

If you find a database with 4 scorers per record, let me know.

PaulMuadDib commented 6 years ago

I misunderstood the table 1 p 15 of your thesis. I just got a very strange result on the same SHHS data with a spiking neural network (based on bindsnet) trained with STDP on the spectrograms (rapidly reaching a 100% accuracy, 99% f1 value on 250x30s batches, and a very nice confusion matrix on the last 2500x30s)... I need to recheck these results and see how they transfer on the test signals but: image

skjerns commented 6 years ago

100% accuracy is certainly overfitting. With sleep data, if you get past 90% accuracy on your training set, you can be sure that you were overfitting. Also numbers on the training set are meaningless. A neural network of sufficient size will always be able to reach 100% accuracy on the training set, simply by "memorizing" the input. This is why you need to report on the validation or even test set.

What's your reasoning for using bindsnet? Interesting approach. Nice to see you are working with this, keep on going!

PaulMuadDib commented 6 years ago

Even if the same accuracy / f1 value is obtained on testing data, the remaining ~30% (4000 30s frames) of the dataset ? (14 000 30sec frames). Overfitting also popped into my mind but these accuracies were reached very early (at 5-6000 frames) in the training (10 000 30sec frames): the SNN hasn't learnt anything from nearly half of the training dataset and was basically already (during the training phase) in a predictive mode (not learning anything). The 50 individuals drawn from the SHHS are very similar (age, BMI, no to mild apnea,...). With the AutoSleepScorer (with 2D convs and a larger sequence = 12 for the RNN) on the two EEG & EOG, I have never exceeded 86% of accuracy & 76% of F1 value, so reaching above 99% (rounded to 100) in less than 1 hour of CPU looked very suspicious to me, also knowing the nature of the dataset itself. This SNN is very simple, it is based on Diehl & Cook 2015 implementation on Bindsnet and has been used on the MNIST database originally: there are 3 neurons populations: an input/coding population which encodes each time-frequency bin of the spectrogram (of the sum of EEG channels) into a Poissonian spike train, an excitatory neurons population (1000) and an inhibitory population (1000) which are interacting (mutually connected). Only the connections between the input/coding population (100x100) and the excitatory population (1000) are learnt through STDP. I am going to go through this spiking code to see what I may be missing here, to test it on a more general population, to slim it to see how performance evolves, and to see if I can get interpretable stuff from its weights (how it discriminates the spectrograms parts).

skjerns commented 6 years ago

If this is on hold out data I assume there is some error in your setup. Reaching 100%, especially with sleep data, is close to impossible, due to the limit of the low the inter- and intra-rater-reliability. I'd look at everything above 90% with high suspicion. You will not reach higher than that on AASM sleep staged data (staged by humans).