lina-usc / pylossless

🧠 EEG Processing pipeline that annotates continuous data
https://pylossless.readthedocs.io/en/latest/
MIT License
18 stars 8 forks source link

Enable Parallel ICA Computation in By-core Workloads #98

Closed Andesha closed 1 year ago

Andesha commented 1 year ago

AMICA was able to run in parallel. We want the new pipeline and FastICA to do the same.

Judging by some online results (ChatGPT) the ica.fit function of FastICA from sklearn should take an n_jobs parameter.

I was able to successfully pass in this parameter to the fit function but did not see the process launch to multiple cores.

It could be a whole bunch of things:

This is a future problem and not for a 1.0 style release.

Andesha commented 1 year ago

Turns out this is a bit more complicated.

To make a long story short when you have a whole node allocated in SLURM, everything plays nice and parallel behaviour works as expeted.

I'm going to rename the issue to be more about non-whole-node computation as in the long run that should be the goal. Whole node is too wasteful and not something I can condone in a large scale production environment.

scott-huberty commented 1 year ago

Thanks for working on this - I agree that figuring this out will really pay off later - since running the pipeline on the cluster is definitely the goal!

Andesha commented 1 year ago

There's been a lot of progress on this so I'll just list some things below:

image

Andesha commented 1 year ago

I've just now done a PR that provides an example of my working environment on Narval.

To be honest, the way that ICA is implemented at this time (or how we have configured it) makes it such that increasing the core count does not significantly reduce computation time.

tl;dr - a single core and 12G of RAM worked for an entire study for me that had 64 channels, 40 minutes of recording, at 512hz and finished all before an hour.

scott-huberty commented 1 year ago

is the issue with the way we call ICA? We expose the call to mne.preprocessing.ica in the config, so if it's just a matter of changing a parameter in mne that would be great.

Andesha commented 1 year ago

I was digging down into the level that mne itself is calling ICA.

tbh, it's not currently worth rabbit holing on this problem as the current performance is excellent