skjerns / AutoSleepScorer

An open-source sleep stage classification Python package
GNU Affero General Public License v3.0
103 stars 22 forks source link

Precision #6

Closed RAllon closed 6 years ago

RAllon commented 6 years ago

I have found both your code and your masters thesis to be incredibly helpful in doing so. Very recently I was testing your scorer and datasets with a precision marker, and received 55% precision rather than 80%. It is very possible that I made a mistake, so if you have a module/function that computes the accuracy, that will be great. Thank you so much.

skjerns commented 6 years ago

First of all thanks!

That is indeed a strange behaviour.

  1. Which dataset were you using? Which electrodes as EEG/... and ref?
  2. Which model were you using (the pre-trained one?)?

Please note that I'm not reporting precision but F1 and accuracy.

RAllon commented 6 years ago

To make the predictions using the EDF files I used the code:

file = SleepData(filename,     
                              channels={'EEG':'EEG Fpz-Cz', 'EMG':'EMG submental', 
                                        'EOG':'EOG horizontal'}, preload=False)
scorer = Scorer([file], hypnograms=True, demo=True)
scorer.run()

To find the precision I used:

x = np.zeros((39, 4)) x[i,:] = precision_recall_fscore_support(real_short, pred_short, average='macro', warn_for ='precision')

I downloaded the files from the link below: https://github.com/skjerns/AutoSleepScorerDev/blob/master/edfx_database.py

skjerns commented 6 years ago

I'll look into it. Please supply me with the full script you are using, including definition of real_short and pred_short.

It should be noted that the weights supplied at the moment are of training on the CCSHS50, not the EDFx. The values supplied in the README are from training and testing within one dataset (also see my thesis). I adapted the README to make that clear. When opening this repo I still had plans to continue development, but had not time ressources so far. Still 55% seems too low for the transfer.

RAllon commented 6 years ago

First run:

from sleepscorer import tools
import glob
import numpy as np
from sleepscorer import Scorer, SleepData
edffiles =  glob.glob("/Users/razallon/Dropbox/Sleep Lab/SKJ Data/Edf and pred//*.edf")
filesarr = np.column_stack((edffiles))
for i in range(len(edffiles)):
    print(edffiles[i])
    filename = edffiles[i]
    file = SleepData(filename,     
                                  channels={'EEG':'EEG Fpz-Cz', 'EMG':'EMG submental', 
                                            'EOG':'EOG horizontal'}, preload=False)
    scorer = Scorer([file], hypnograms=True, demo=True)
    scorer.run()

Next run:

from sleepscorer import tools
from sleepscorer import Scorer, SleepData
from sklearn.metrics import precision_recall_fscore_support
import glob
import numpy as np
x = np.zeros((39, 4))
csvfiles = glob.glob("/Users/razallon/Dropbox/Sleep Lab/SKJ Data/Hyp Groundtruth/*.csv")
edffiles =  glob.glob("/Users/razallon/Dropbox/Sleep Lab/SKJ Data/Edf and pred//*.edf")
filesarr = np.column_stack((edffiles, csvfiles))
for i in range(len(csvfiles)):
    open(csvfiles[i])
    filename = edffiles[i] 
    import matplotlib.pyplot as plt
    edf=[]
    for k in filename: 
        edf.append(k)
    csv = [".","c","s","v"]
    edf.extend(csv)
    csvfilename = ''.join(edf)
    import csv
    csv_delimiter='\t'
    h = {'0':0, '1':1, '2':2, '3':3, '4':4, '5':5, '6':6, '7':7, '8':8, '9':9,
                 'W':0, 'S1':1, 'S2':2, 'S3':3, 'S4':4, 'SWS':3, 'REM':5, 'R':5,
                 'A':6, 'M':8, '?':9}            
    with open(csvfilename) as csvfile:
                   csvreader = csv.reader(csvfile, delimiter=csv_delimiter)
                   predicted = []
                   for k in csvreader:
                        if len(k)>0:  predicted.append(h[k[0]])
    predicted = np.array(predicted, dtype=np.int32).reshape(-1, 1)
    predicted[predicted==4]=3
    predicted[predicted==5]=4
    realcsv = csvfiles[i]
    with open(realcsv) as csvfile:
                   csvreader2 = csv.reader(csvfile, delimiter=csv_delimiter)
                   real = []
                   for k in csvreader2:
                        if len(k)>0:  real.append(h[k[0]])
    real = np.array(real, dtype=np.int32).reshape(-1, 1)
    real[real==4]=3
    real[real==5]=4
    m = min(real.size, predicted.size)
    real_short = real[0:m]
    pred_short = predicted[0:m]
    x[i,:] = precision_recall_fscore_support(real_short, pred_short, average='macro', warn_for ='precision')
print(x)
print(x[i,:4])
skjerns commented 6 years ago

I had a bit of trouble understanding what you did in your code, especially where you got the ground truth from?

I created a script that does what you want, you can see it here: transfer_performance_edfx.py

Mean transfer accuracy: 72.2%
Mean transfer F1 score: 59.1%

These scores are quite close to the ones reported in my thesis (Acc: 73%, F1: 61%). The models in the thesis and the repo are slightly different, therefore the slight difference in performance. Please regard 3.4 Transfer performance for details.

Few notes:

  1. The model weights provided were trained on the CCSHS, therefore you need to look at the Transfer Performance. The CCSHS features adolescents, while the EDFx features adults from 20-80 years old. I think it's already quite something to reach 72% accuracy in this case.
  2. The model given in the repo was just a random model from a cross-validation fold (60% of the data), while the transfer experiment used a model trained on 80% of the data (afaik), explaining the slight difference.
  3. S4 and S3 are concatenated to SWS, while all non-sleep stages are changed to wake. This is also done while model training.
  4. The EDFx features excessive pre- and post-sleep recording which should be truncated. If this isn't done it skews performance

I also see that the model-link is now down again, unfortunately I have no means of hosting such a large file easily. I hope to have more time continuing the project later.

Hope I could shed some light on your concerns!

RAllon commented 6 years ago

Thank you so much for the code. I recently ran it and made a few syntax changes such as indentations. I have copied it below.

import glob
import csv
import numpy as np
from sklearn.metrics import f1_score
from sleepscorer import Scorer, SleepData

edfx_dir = "/Users/razallon/Dropbox/Sleep Lab/SKJ Data/RealHypno/*.edf"

print('Converting hypnograms')
files = glob.glob(edfx_dir + '*.hyp')
for file in files:
    hypnogram = []
with open(file, mode='rb') as f:
    raw_hypno = [x for x in str(f.read()).split('Sleep_stage_')][1:]
for h in raw_hypno:
            stage  = h[0]
            repeat = int(h.split('\\')[0][12:])//30 # no idea if this also works on linux
            hypnogram.extend(stage*repeat)   
with open(file[:-4] + '.txt', "w") as f:
        writer = csv.writer(f, lineterminator='\r')
        writer.writerows(hypnogram)
edffiles =  glob.glob(edfx_dir)
filesarr = np.column_stack((edffiles))
files = []
for i in range(len(edffiles)):
    print(edffiles[i])
    filename = edffiles[i]
    file = SleepData(filename, channels={'EEG':'EEG Fpz-Cz', 'EMG':'EMG submental', 'EOG':'EOG horizontal'}, preload=False)
    files.append(file)

scorer = Scorer(files, hypnograms=False, demo=True)
scorer.run()

truth = glob.glob(edfx_dir + '*.txt')
pred =  glob.glob(edfx_dir + '*.csv')

conv_dict =  {'1':1, '2':2, '3':3, '4':3,
'W':0, 'S1':1, 'S2':2, 'S3':3, 'S4':3, 'SWS':3, 'REM':4, 'R':4,
'A':0, 'M':0, '?':0} 
accs = []
f1s  = []

for i in range(len(truth)):
    with open(truth[i],'r') as file:
        y_true = file.read().split('\n')[:-1]
        y_true = np.array([conv_dict[x] for x in y_true])

for i in range(len(pred)):
    with open(pred[i],'r') as file:
        y_pred = file.read().split('\n')[:-1]
        y_pred = np.array([int(x) for x in y_pred])
        zero  = np.where(y_true!=0)[0]
        idx_start = zero[0] -60
        idx_stop  = zero[-1] +60
        y_pred = y_pred[idx_start:idx_stop]
        y_true = y_true[idx_start:idx_stop]
        accs.append(np.mean(y_true==y_pred))
        f1s.append(f1_score(y_true, y_pred, average='macro'))
print('Prediction scores from a model trained on CCSHS50')
print('Mean transfer accuracy: {:.1f}%'.format(np.mean(accs)*100))
print('Mean transfer f1 score: {:.1f}%'.format(np.mean(f1s)*100))

I have also copied the results below.

0.16496465043205027, 0.29850746268656714, 0.34956794972505889, 0.41162608012568735, 0.33150039277297721, 0.29772191673212883, 0.32757266300078552, 0.26787117046347209, 0.35113904163393561, 0.27336999214454044, 0.25216025137470544, 0.24194815396700706, 0.30007855459544386, 0.29536527886881381, 0.26315789473684209, 0.29850746268656714, 0.26551453260015712, 0.34564021995286726, 0.30557737627651216, 0.29143754909662217, 0.37941869599371564, 0.29379418695993714, 0.30871956009426549, 0.24666142969363708, 0.26158680282796543, 0.30164964650432052, 0.28515318146111546, 0.38491751767478399, 0.29300864100549884, 0.31578947368421051, 0.2584446190102121, 0.28279654359780049, 0.26865671641791045, 0.23880597014925373, 0.24194815396700706, 0.30322073841319719, 0.31421838177533384, 0.24744697564807541, 0.29222309505106048]

Is there something that I am doing wrong? Are we using different datasets?

I recieved the datasets from the code below.

#%%   ## download edfx database and prepare it
if __name__ == '__main__':
    datadir = 'edfx'

    # prepare dataset if it does not exist
    if not os.path.isfile(os.path.join(datadir, 'sleepdata.pkl')):
        edfx_database.download_edfx(datadir)
        edfx_database.convert_hypnograms(datadir)

        channels = {'EEG':'EEG FPZ-CZ', 'EMG':'EMG SUBMENTAL', 'EOG':'EOG HORIZONTAL'} # set channels that are used
        references = {'RefEEG':False, 'RefEMG':False, 'RefEOG':False} # we do not set a reference, because the data is already referenced
        sleep = sleeploader.SleepDataset(datadir)
        # use float16 is you have problems with memory or a small hard disk.  Should be around 2.6 GB for float32.
        sleep.load( channels = channels, references = references, verbose=0, dtype=np.float32)
        edfx_database.truncate_eeg(sleep)

    # if the pickle file already exist, just load that one.
    else:
        sleep = sleeploader.SleepDataset(datadir)
        sleep.load_object() # load the prepared files. Should be around 2.6 GB for float32

    # load data
    data, target, groups = sleep.get_all_data(groups=True)
    data = zscore(data,1)
    data = tools.normalize(data)

    target[target==4] = 3  # Set S4 to S3
    target[target==5] = 4  # Set REM to now empty class 4
    target = keras.utils.to_categorical(target)

I was also wondering if you had a code that could convert XML files into CSV files? Thank you so much.

skjerns commented 6 years ago

Please read how to format code on github: Readme. I have done the formatting for you for now, but I'm not sure if the missing indents are by Github or by your code.

I recently ran it and made a few syntax changes such as indentations

Why did you make syntax changes? The code should be runable as it is, with only exchanging the EDFx-path

As the code is formatted right now there are some missing indents here:

for file in files:
    hypnogram = []
with open(file, mode='rb') as f:
    raw_hypno = [x for x in str(f.read()).split('Sleep_stage_')][1:]
for h in raw_hypno:
            stage  = h[0]
            repeat = int(h.split('\\')[0][12:])//30 # no idea if this also works on linux
            hypnogram.extend(stage*repeat)   
with open(file[:-4] + '.txt', "w") as f:
        writer = csv.writer(f, lineterminator='\r')
        writer.writerows(hypnogram)

With these lines you create only one hypnogram file. Could you check that?

Also: What Tensorflow/Keras version are you using?

RAllon commented 6 years ago

Thank you so much for the clarification. I ran the code again and got the following results.
Mean transfer accuracy: 29.4% Mean transfer f1 score: 14.6%

I am starting to think that we are using different datasets. Can you clarify which dataset you are using? I am using your edfx_database.download_edfx code to download the data.

skjerns commented 6 years ago

This is quite impossible. There is only one edfx dataset.

The code I provided works with the dataset from physionet https://physionet.org/pn4/sleep-edfx/ It is highly unlikely that it will provide different results on different machines. Either your model file is compromised (or wrong), or your database is.

I'm closing this issue as the error does not seem to lie on my side.

gadallon1 commented 6 years ago

I am trying to run the code above, and I am getting the following numbers:

Mean transfer accuracy: 31.7% Mean transfer f1 score: 21.5%

I am getting the following error message: Using TensorFlow backend. /Users/gadallon/anaconda3/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6 return f(*args, **kwds)

Do you think it might be impacting the results? sorry if this is something obvious.

skjerns commented 6 years ago

could you tell me how big your model files are (./weights/cnn.hdf5)?

As I see the link is down again, so it might be possible that you are downloading an empty file?

gadallon1 commented 6 years ago

It's 395MB.

skjerns commented 6 years ago

Should be 377MB!

I rerun the code on a different machine, including redownloading the edfx-files, and I still get the same results. I do not know what you are doing differently

Please reinstall the package with pip uninstall sleepscorer pip install git+https://github.com/skjerns/AutoSleepScorer

Dedownload the weights. You can also download them manually from here: cnn: https://www.dropbox.com/s/otm6t0u2tmbj7sd/cnn.hdf5?dl=1 rnn: https://www.dropbox.com/s/t6n9x9pvt5tlvj8/rnn.hdf5?dl=1

gadallon1 commented 6 years ago

sorry for the delay. I did actually have the right file. It's 395 on the disk, but 377 before the download. I did everything from scratch and got the same accuracy. I put everything on a new PC (I usually use mac) and got even a lower accuracy (29%). I am suspecting I am doing something wrong. Any chance we can skype at one point to see what's going on.

skjerns commented 6 years ago

This absolutely puzzles me. Especially how you are possible to get even different results on a new machine!

We can skype this weekend if you wish, my username is the same as here.

skjerns commented 6 years ago

follow-up:

seems to be a problem with the file reading at the end of the script. The predictions are created fine.