Enable Parallel ICA Computation in By-core Workloads

lina-usc / pylossless

🧠 EEG Processing pipeline that annotates continuous data

https://pylossless.readthedocs.io/en/latest/

MIT License

18 stars 8 forks source link

Enable Parallel ICA Computation in By-core Workloads #98

Closed Andesha closed 1 year ago

Andesha commented 1 year ago

AMICA was able to run in parallel. We want the new pipeline and FastICA to do the same.

Judging by some online results (ChatGPT) the ica.fit function of FastICA from sklearn should take an n_jobs parameter.

I was able to successfully pass in this parameter to the fit function but did not see the process launch to multiple cores.

It could be a whole bunch of things:

I was testing this in a SLURM environment of the alliance clusters; maybe how I requested cores was wrong
The version of sklearn in the alliance wheelhouse might be problematic
It's not getting the parameter properly though I doubt this one because I went into the mne libs and hard coded it

This is a future problem and not for a 1.0 style release.

Andesha commented 1 year ago

Turns out this is a bit more complicated.

To make a long story short when you have a whole node allocated in SLURM, everything plays nice and parallel behaviour works as expeted.

I'm going to rename the issue to be more about non-whole-node computation as in the long run that should be the goal. Whole node is too wasteful and not something I can condone in a large scale production environment.

scott-huberty commented 1 year ago

Thanks for working on this - I agree that figuring this out will really pay off later - since running the pipeline on the cluster is definitely the goal!

Andesha commented 1 year ago

There's been a lot of progress on this so I'll just list some things below:

Graham had significant issues with convergence of the ICA more than 50% of the time
- No idea why :shrug: - Some sort of request formatting issue?
- The amount of cores used by the ICA was random
Got it running reliably on Narval within a single core
- Finished in about 15 minutes on a 256hz file with ~40mins of data
- Will scale up the tests
Tried running it with 64 cores, it fails with strange errors due to expected number of cores/threads
- See error image below
- Bulk of the continued work will likely be here
Made a stripped down example outside of pylossless and it ran on all 64 cores
- Not strictly faster though, will need profiling

Andesha commented 1 year ago

I've just now done a PR that provides an example of my working environment on Narval.

To be honest, the way that ICA is implemented at this time (or how we have configured it) makes it such that increasing the core count does not significantly reduce computation time.

tl;dr - a single core and 12G of RAM worked for an entire study for me that had 64 channels, 40 minutes of recording, at 512hz and finished all before an hour.

scott-huberty commented 1 year ago

is the issue with the way we call ICA? We expose the call to mne.preprocessing.ica in the config, so if it's just a matter of changing a parameter in mne that would be great.

Andesha commented 1 year ago

I was digging down into the level that mne itself is calling ICA.

tbh, it's not currently worth rabbit holing on this problem as the current performance is excellent