theislab / scib

Benchmarking analysis of data integration tools
MIT License
294 stars 63 forks source link

Improve LISI multiprocessing specification #301

Closed mumichae closed 2 years ago

mumichae commented 2 years ago

LISI supports multiprocessing, however specifying the number of cores and whether to use multiprocessing is not well defined. Here I address

  1. remove multiprocessing key from lisi functions
  2. rename nodes to n_cores
  3. use n_cores=1 for running without parallelisation
  4. use n_cores=1 as default for metrics wrapper functions

Bug fix included:

  1. changed expected input of n_chunks for knn_graph.o to actually reflect the number of chunks (#chunks) created, NOT the index of the largest chunk (chunks-1)
  2. compile knn_graph.cpp file on install with setup.py
mumichae commented 2 years ago

From what I understood from the code, the number of splits is basically the same as the number of cpu cores - 1, so I simplified it as such. To be exact, n_chunks == n_processes and n_splits == n_chunks - 1

LuckyMD commented 2 years ago

To be exact, n_chunks == n_processes and n_splits == n_chunks - 1

True, just got this as well. And you don't want to use the whole node if not specified otherwise?

mumichae commented 2 years ago

I realised that we actually had a bug in the C code. When lisi_graph_py is called with 2 cores, the script will set the max chunk index to 0, although it should be 1. I fixed this and had to recompile the code.

Unfortunately, the last step failed on github actions likely because the g++ version I was using was not compatible with the one on github actions. For that, I also included the lisi code compilation into the setup.py. Might be a bit hacky, but that's the only thing that I managed to get to work.

@LuckyMD Would be great if I could have another review on this decision.

LuckyMD commented 2 years ago

I actually like the compilation in setup.py. No testing whether g++ is installed though, so it might create some strange errors.

mumichae commented 2 years ago

Thanks @mbuttner for the review!

I updated the compilation call so that any error message will be printed, but the installation will continue. The output is only visible with pip install scib -v, but that's a pip thing.

Once everything runs through I'll merge and update scib to a new version.