mne-tools / mne-python

MNE: Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python
https://mne.tools
BSD 3-Clause "New" or "Revised" License
2.7k stars 1.31k forks source link

GSOC inquiry #5849

Closed JoseAlanis closed 5 years ago

JoseAlanis commented 5 years ago

Dear MNE-Community, I heard from @jona-sassenhagen that you might be accepting applications for Google Summer of Code this year. I would be very interested in submitting a project. I was thinking of some functions or pipeline to carry out linear (and generalised linear) mixed effects regression analyses for EEG-data. Please let me know if you are accepting applications. Best, José

jona-sassenhagen commented 5 years ago

@dengemann

larsoner commented 5 years ago

I heard from @jona-sassenhagen that you might be accepting applications for Google Summer of Code this year

Yes, we plan to accept applications!

I was thinking of some functions or pipeline to carry out linear (and generalised linear) mixed effects regression analyses for EEG-data.

The first thing to do is probably look at what we already have. @jona-sassenhagen can help guide you.

The next thing to do is see if there are Python packages for doing mixed effects regression analysis. @drammock might also know. It would be good if we didn't have to code the actual math from scratch, and could focus instead on how to make nice interfaces for using them on M/EEG sensor and source data.

I can't gauge the scope too well, but it sounds potentially plausible to me.

drammock commented 5 years ago

statsmodels has linear and generalized linear mixed effects modeling capabilities. Last time I checked (probably 9 months ago) there were still some significant limitations on the kinds of random effects structures it could handle (IIRC, it handled nested but not crossed designs, or something like that; also I think their code for linear mixed models handles more designs than their code for generalized linear mixed models).

JoseAlanis commented 5 years ago

Alright, thanks a lot for the infos. I'll have a look at the resources and links and see what I can do to contribute.

jona-sassenhagen commented 5 years ago

@JoseAlanis any ideas yet?

JoseAlanis commented 5 years ago

Hey @jona-sassenhagen, yes, sorry for the delay. I've been looking at the linear regression analysis discussed in the MNE examples gallery (here).

If I understand correctly, the linear regression algorithm uses single trial data (each sample in the epoch) to estimate the regression coefficients. I think it would be neat to be able to model individual differences in the response variable’s variance by adding random effects; e.g., nesting individual epochs within participants, or conditions, for instance. convenient if we want to model interdependencies among measurements or handle unbalanced data structures more efficiently.

I had a look at statsmodels, which in my opinion provides a nice framework for building the nested models. It also allows for the specification of random slopes, which would enable us to specify (at least to some degree) cross-level interactions (see here)

An alternative approach would be to use a convenience package for wrapping the R-package lme4 in python (e.g., pymer4). It allows a wider range of model specifications, but people would need to have R and lme4 installed in their computers.

So, yeah, these are the general ideas. Any comments are most welcome!

jona-sassenhagen commented 5 years ago

While a lot of people would highly appreciate mixed-effects regression, both of these options have serious drawbacks. The problems with stats models is, as we've discussed on here before, that its mixed-effects regression is not fully featured. The problem with other packages is that they're new, untested, etc ...

However, for now, it would be important for you to demonstrate that you can work on MNE-Python in general. I.e., pick an easy issue, and have it merged in time for the application. As far as I understand it, that is essential to signalling to Google that we pick applicants who have a high likelihood of actually completing their projects.

Just start with an easy issue, lay a claim on it, start with a PR, and we will support you.

jona-sassenhagen commented 5 years ago

... or pick something else to bugfix, something no issue exists for yet.

Did I understand that correctly @agramfort @larsoner ?

larsoner commented 5 years ago

Did I understand that correctly @agramfort @larsoner ?

Yep, it's important for us to see how people interact with git / GitHub / code-review. I'd look here:

https://github.com/mne-tools/mne-python/issues?q=is%3Aopen+is%3Aissue+label%3AEASY

JoseAlanis commented 5 years ago

Alright, sure. Thanks. I was looking more at the long term project. But I get it, I'll have a look at those right away.

dengemann commented 5 years ago

@JoseAlanis is there any chance that we can hangout with @jona-sassenhagen and who else wants to join? I'd be very interested in this GSOC project but I also think there are a few adjustments to make.

My position on the proposed topic is briefly this: there are infinitely many interesting multi-level models to make and it is not clear which particular model structure we should support. Interfacing with statsmodels and R is not ideal and we should not do it. The best way to specify such models these days is using probabilistic programming software like https://pystan.readthedocs.io/en/latest/ or https://pymc3.readthedocs.io. However, I am not convinced we need an interface. You write down the maths for the model you think is good for your problem, then you write it like that in Stan / pymc3. So this could go in an example / tutorial but also it should not become a dependency. Moreover, I think making it easy to fit such models without understanding them is a bad idea.

I'd instead suggest we focus on our existing GLM and improve inference options (residual bootstrap, informal Baysian simulation of prameters and predictions / parametric bootstrap, explore if we can implement cluster-level inference via permutation tests, make many examples of common designs). Moreover, we should also see what we can recycle from https://nistats.github.io.

Short: before delving into complex to fit multi-level models let's get the maximum out of our GLM.

larsoner commented 5 years ago

Closing this, feel free to open targeted/specific issue(s) about how to improve our linear regression code