meeg-ml-benchmarks / brain-age-benchmark-paper

M/EEG brain age benchmark paper
https://meeg-ml-benchmarks.github.io/brain-age-benchmark-paper/
BSD 3-Clause "New" or "Revised" License
24 stars 10 forks source link

complete deep learning (shallow, deep) benchmarks #16

Closed dengemann closed 2 years ago

dengemann commented 2 years ago

now that the benchmark script seems battle-tested, we still need to compute the results. In the figure below a few deep boxes are missing :)

Screenshot 2021-10-27 at 07 41 39

I will take care of the missing handcrafted box.

The idea would be that @gemeinl and @hubertjb share a screen and fight / debug together with our Inria server.

I'm only one call / message away.

gemeinl commented 2 years ago

Ok, so code is not running yet? Let me know when you are free @hubertjb !

dengemann commented 2 years ago

It’s running, Hubert found some issues with the tuab data. And I had to push a mission script. See my last commit to main.


From: gemeinl @.> Sent: Thursday, October 28, 2021 9:36:43 AM To: dengemann/meeg-brain-age-benchmark-paper @.> Cc: Denis A. Engemann @.>; Author @.> Subject: Re: [dengemann/meeg-brain-age-benchmark-paper] complete deep learning (shallow, deep) benchmarks (Issue #16)

Ok, so code is not running yet? Let me know when you are free @hubertjbhttps://github.com/hubertjb !

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/dengemann/meeg-brain-age-benchmark-paper/issues/16#issuecomment-953580339, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAOR7CQAMKMS33RCH7EA3GLUJEDRXANCNFSM5G2YWICA.

dengemann commented 2 years ago

With that we can re-extract Tuab data. We forgot to take care of resampling to common frequency.


From: gemeinl @.> Sent: Thursday, October 28, 2021 9:36:43 AM To: dengemann/meeg-brain-age-benchmark-paper @.> Cc: Denis A. Engemann @.>; Author @.> Subject: Re: [dengemann/meeg-brain-age-benchmark-paper] complete deep learning (shallow, deep) benchmarks (Issue #16)

Ok, so code is not running yet? Let me know when you are free @hubertjbhttps://github.com/hubertjb !

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/dengemann/meeg-brain-age-benchmark-paper/issues/16#issuecomment-953580339, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAOR7CQAMKMS33RCH7EA3GLUJEDRXANCNFSM5G2YWICA.

hubertjb commented 2 years ago

@dengemann @gemeinl Quick update: I re-converted TUAB to BIDS to fix the number of channels issue (I had to save the data in the BrainVision format, but now we have all 21 channels), re-preprocessed with resampling at 200 Hz and finally applied autoreject. I did a quick test with ShallowNet and the model does train - I got performance values for a quick test with 10 recordings only. :)

Now running with the whole dataset, it's taking about 100 s per epoch, most of which is actually not using the GPU. Not sure why, but I'll wait for the training to finish so we have some results before digging deeper.

Of note, I had to set preload=True when loading the epochs because I got an OSError: Too many files opened. It looks we might be able to increase the limit (https://stackoverflow.com/questions/16526783/python-subprocess-too-many-open-files), but for now since the machine I'm using has a lot of RAM I'm using preload=True.

hubertjb commented 2 years ago

First results in (using 2-fold CV for now): MAE(shallow, tuab) = 8.213297017716052 r2(shallow, tuab) = 0.5700137267570398

From looking at the figure posted above, this looks pretty similar to the filterbank-riemann model!

hubertjb commented 2 years ago

On the same two folds, for Deep4Net: MAE(deep, tuab) = 9.48504621480895 r2(deep, tuab) = 0.4640109871399135

hubertjb commented 2 years ago

I think the issue was caused by having a too large num_workers, which likely created an IO bottleneck. Capped at n_gpus * 5 it's now much faster - it's taking ~35 s/epoch with ShallowNet with a 2-gpu setup.

hubertjb commented 2 years ago

See #22 for the changes. I launched the benchmark for shallow and deep on TUAB, will report the results tomorrow!

gemeinl commented 2 years ago

Thanks for the update @hubertjb. Great work! It is interesting that we get issues with too many open files, we "just" use ~1400 recordings. Cool that you figured out a way how to circumvent this!

It is a relief to know that the models seem to lie in the expected performance range. Maybe it would still be worth to check the learning curves. The number of training epochs was basically just a guess. The deep net probably requires some more time to fit. I'll figure out a way how to do it.

dengemann commented 2 years ago

Btw. Any unfiltered impressions regarding adding learning curve benchmarks to the main results?


From: gemeinl @.> Sent: Friday, October 29, 2021 1:04:29 PM To: dengemann/meeg-brain-age-benchmark-paper @.> Cc: Denis A. Engemann @.>; Mention @.> Subject: Re: [dengemann/meeg-brain-age-benchmark-paper] complete deep learning (shallow, deep) benchmarks (Issue #16)

Thanks for the update @hubertjbhttps://github.com/hubertjb. Great work! It is interesting that we get issues with too many open files, we "just" use ~1400 recordings. Cool that you figured out a way how to circumvent this!

It is a relief to know that the models seem to lie in the expected performance range. Maybe it would still be worth to check the learning curves. The number of training epochs was basically just a guess. The deep net probably requires some more time to fit. I'll figure out a way how to do it.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/dengemann/meeg-brain-age-benchmark-paper/issues/16#issuecomment-954653459, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAOR7CQMI5J4FGJ56CFPCHLUJKEU3ANCNFSM5G2YWICA.

hubertjb commented 2 years ago

It just finished training! Here are the mean results:

               MAE        r2     fit_time  score_time
benchmark
deep       8.191346  0.562772  1626.901164   39.094545
shallow    7.901493  0.600152  1954.250558   43.815481

@dengemann I just pushed the two csvs to #22. (I tried to make the plots but it looks like there's a utils.r script that's missing.)

hubertjb commented 2 years ago

It is a relief to know that the models seem to lie in the expected performance range. Maybe it would still be worth to check the learning curves. The number of training epochs was basically just a guess. The deep net probably requires some more time to fit. I'll figure out a way how to do it.

I agree, that's a good idea. I guess we would expect Deep4Net to perform better than ShallowNet for instance, which is not currently the case. Plotting the learning curves might help elucidate that.

dengemann commented 2 years ago

@hubertjb pushed missing utils.r

dengemann commented 2 years ago

@hubertjb does anything speak against running it on the other datsets?

hubertjb commented 2 years ago

Thanks @dengemann! However I think I'm having some trouble with the font :P fig_performance

hubertjb commented 2 years ago

@hubertjb does anything speak against running it on the other datsets?

No, unless we would want to do some hyperparameter tuning (e.g. by looking at the training curves) before launching more compute. Which dataset should I start with?

dengemann commented 2 years ago

I'd say lemon is next, then Cam-CAN, then chbp

dengemann commented 2 years ago

However I think I'm having some trouble with the font :P

That's surprising, it should just be using a default font. Can you push edits on script to master [I guess you added 1 extra color]? I can run it on my side.

hubertjb commented 2 years ago

However I think I'm having some trouble with the font :P

That's surprising, it should just be using a default font. Can you push edits on script to master [I guess you added 1 extra color]? I can run it on my side.

I just pushed to changes to main @dengemann

hubertjb commented 2 years ago

I'd say lemon is next, then Cam-CAN, then chbp

I had to leave for 30 minutes so I started the benchmark on Cam-CAN before seeing your reply. It looks like the model stops learning after only 12 epochs, i.e. the training loss starts increasing (on the first fold at least). This makes me think we'll really need to do some dataset-specific hyperparameter tuning.

I'll launch it on LEMON just to see if that's the case too.

hubertjb commented 2 years ago

I couldn't get results on LEMON, I got the following error while trying to load the data:

FileNotFoundError: File does not exist: /storage/store3/derivatives/LEMON_EEG_BIDS/sub-010285/eeg/sub-010285_task-RSEEG_proc-autoreject_epo
hubertjb commented 2 years ago

Using 2-fold CV on Cam-CAN, I get:

         MAE        r2     fit_time  score_time dataset benchmark
0  21.771525 -1.022423  1163.038959   87.999460  camcan      deep
1  11.378344  0.397487  1195.394894   87.036303  camcan      deep

I think this makes it clear we need to do some hyperparameter tuning. Also, I haven't looked at why yet, but ShallowNet returned nans here.

hubertjb commented 2 years ago

Same problem on CHBP as with LEMON:

FileNotFoundError: File does not exist: /storage/store3/derivatives/CHBMP_EEG_and_MRI/sub-CBM00179/eeg/sub-CBM00179_task-protmap_proc-autoreject_epo
dengemann commented 2 years ago

@hubertjb can you do an ls -lrth /storage/store3/derivatives/CHBMP_EEG_and_MRI/sub-CBM00179/eeg/ to see what's going on there?

Also for the other cases. I think for 1-2 subjects things failed, you need to handle that by catching exceptions. For the other subjects all should be there.

dengemann commented 2 years ago

fig_performance_r2

latest plots after reconsidering priorities (put r2 as main figure, MAE as supplement) with updated color codes, better sizes and finer grid. I'll start a write-up based on this. Hopefully we can soon fill in the remaining blank boxes :) please keep me closely in the loop, such that I can help @hubertjb @gemeinl

hubertjb commented 2 years ago

@dengemann I'll try what you suggested above later today and will keep you updated.

About doing some hyperparameter tuning: do you have a preferred way of doing this? @dengemann @gemeinl I was thinking we could further divide the training set of each fold into a training and validation sets (and drop the test set), report performance for a little grid search on maybe {learning rate, batch size, dropout rate}, and finally pick the best configuration for each dataset.

hubertjb commented 2 years ago

After skipping failed files I got the models to run on both LEMON and CHBP @dengemann.

I started the work towards hyperparameter tuning in #24 if you want to take a look.

gemeinl commented 2 years ago

There is a couple of things I find noteworthy/unexpected:

Since all the preprocessing scripts seem to work robustly right now, I will also do some investigation on my end. Any thoughts @robintibor ?

dengemann commented 2 years ago

latest results updated with LEMON benchmark (we're now done with the planned set of non-deep benchmarks):

fig_performance_r2

gemeinl commented 2 years ago

I have looked at the different runtimes for deep and shallow. It is likely that the slower runtime for shallow arises from a very large last convolutional layer. This is caused by setting 'final_conv_length' to 'auto' and the trialwise decoding with trials of 10s at 200Hz. I will implement cropped decoding, as suggested in https://github.com/dengemann/meeg-brain-age-benchmark-paper/issues/25. It should decrease runtime for shallow and at the same time improve performance of deep. I will implement a flag, such that cropped or trialwise decoding can be another choice in the hyperparameter optimization https://github.com/dengemann/meeg-brain-age-benchmark-paper/pull/24

hubertjb commented 2 years ago

See #35 for the first complete set of results.