Open JoseAlanis opened 5 years ago
cc @agramfort @larsoner
hi @JoseAlanis, thanks for all the hard work during your GSoC. It is what pushes the frontiers of open and reproducible science forward. This is an impressive amount of work and I have only one tiny feedback.
It would be great if you have some free time in improving the documentation (both on the MNE side and in the sandbox repository) now that you are familiar with the tools. With regards to the examples, even before diving into any stats or regression, it would be nice to show what the data is all about because many of us don't know what the LIMO dataset contains and what is the metadata in there. Wrapping some things in convenience functions and exposing an API to make the examples shorter etc. would be a priority for me.
However, we believe that using a machine learning package for linear regression might irritate users on the long run.
I don't think this is a real problem. Installing Python packages nowadays is a lot easier than it used to be so relying on sklearn
seems fine.
The second major issue concerns the inference part.
There are private functions that do the clustering step. These should already be separate from the ones that choose and iterate over permutations, etc. but if they aren't, we can separate them better. Then we could have *_bootstrap_*
in place of *_permutation_*
public functions.
Hey guys, thanks a lot for your feedback. I opened a PR for improving the documentation on the LIMO dataset and also adding a proposal for how a subject-level regression function could look like. The we could take the output of that function and use it for group-level inference. Looking forward to your comments.
Dear MNE-community, during the last couple of months, I've been working together with my mentors (@dengemann and @jona-sassenhagen) on the GSoC-project for enhancing statistical inference using linear regression in MNE-Python. As the GSoC period comes to and end, we would like to present you with some of the major achievements and trigger the discussion concerning remaining issues, considerations, and possible strategies for future work.
This a 2-3 min read, sorry for the long post.
Quick recap:
The primary goal of the GSoC project was to broaden the capabilities of MNE in terms of how the fitting of linear regression models is done, putting a particular focus on statistical inference measures and the support of more complex statistical models, which might be of common interest for the MNE-community.
Summary of major achievements:
We though the best way to address this issues would be to set up a „gallery of examples“, which allows users to browse through common research questions, providing auxiliary code for setting up and fitting linear models, as well as inspecting and visualizing results with tools currently available in NumPy, SciPy and MNE.
For this purpose we have put up a sandbox repository, which contains all the work carried our during the GSoC period. The code replicates and extends some of the main analysis and tools integrated in LIMO MEEG a MATLAB toolbox originally designed to interface with EEGLAB. The corresponding website contains examples for typical single-subject and group-level analysis pipelines.
In the following I provide a quick overview of such an analysis pipeline and the corresponding features developed during GSoC.
During the project, we've adopted a multi-level (or hierarchical) modeling approach, allowing the combination of predictors at different levels of the experimental design (trials, subjects, etc.) and testing effects in a mass-univariate analyis fashion, i.e., not only focusing on average data for a few sensors, but rather taking the full data space into account (all electrodes/sensors and at all time points of an analysis time window; see here).
Of particular importance, the analysis pipelines allow users to deal with within-subjects variance (i.e., 1st-level analysis), as well as between-subjects variance (i.e., 2nd-level analysis), by modeling the co-variation of subject-level parameter estimates and inter-subject variability in some possible moderator variable (see here).
This hierarchical approach consist in estimating linear model parameters for each subject in a data set (this is done at each time-point and sensor independently). At the second-level beta coefficients obtained from each subject are integrated across subjects to test for statistical significance.
The implemented methods correspond to tests performed using bootstrap under H1 to derive confidence intervals (i.e., providing a measure of consistency of the observed effects on a group level) and "studentized bootstrap" (or bootstrap-t) to provide an approximation of H0, and control for multiple testing (e.g., via spatiotemporal clustering techniques).
Open questions:
One of the main issues concerns the integration of the fitting tools to MNE's API.
The second major issue concerns the inference part.
cluster_level
code to run spatiotemporal clustering, which in principle mimics the behavior ofmne.stats.cluster_level._permutation_cluster_test
, but uses bootstrap to threshold the results.mne.stats.cluster_level._permutation_cluster_test
or extract the cluster stats frommne.stats.cluster_level._permutation_cluster_test
without permutation and submit these to bootstrap in a second function.There are a couple of other issues, but since this post is already too long, it might be best to discuss them later (or on the issue section of out GSoC repository), also a PR for more in-depth code discussion follows shortly.
I really enjoyed working on this project during the summer and would be glad to continue working on these tools after GSoC.
Thanks for reading and looking forward to your feedback.