pinellolab / dictys

Context specific and dynamic gene regulatory network reconstruction and analysis
GNU Affero General Public License v3.0
101 stars 13 forks source link

Threads vs. Jobs #20

Closed JBreunig closed 1 year ago

JBreunig commented 1 year ago

I've run both your tutorial and some other multiome datasets and this is a brilliant package, congrats! It is clearly pulling out more biologically relevant GRNs when compared with some other tools I've tried.

I have a few questions for clarification: 1) I'm unclear about n_jobs versus n_threads. So -j is n_jobs (i.e., '-j 32' is 32 jobs) but where are the n_threads set? Also, do you have any recommendations for starting points of n_jobs/n_threads using the CUDA/torch GPU path? (I have a 32 core/64 thread Threadripper, 256 GB RAM, and high end Nvidia GPU).

2) Multiome is obviously ideal but are there other paths? (e.g. can one used separate scRNA-seq and scATAC-seq?)

Thanks in advance!

lingfeiwang commented 1 year ago

Hi JBreunig,

Thank you for your interest and great to hear it works well for you!

For question 1, overall we did not comprehensively benchmark the computing speed. But in our experience n_thread=4 was satifactory so we set it default. You can reduce it and increase n_jobs if your memory allows. But increasing n_threads and reducing n_jobs could make it slower. To use all threads, you can set n_jobs = 64 / n_threads. Parameter -j only affects non-GPU steps.

For pytorch with GPU, you should use -J for parallel jobs. In our experience, a single job already gets high GPU utilization. But a high end Nvidia GPU should have sufficient memory to set -J 2 or -J 3, which could make it a bit faster.

You can tune n_threads with variable NTH in makefiles/config.mk. This is done in the tutorial line dictys_helper makefile_update.py ../makefiles/config.mk .... You just need to insert "NTH": "2" for example.

For thread/job count questions, generally you can always try to reach near max utilization for your CPU/GPU to find a good number.

For question 2, definitely! The blood study in our preprint and the analysis-blood example are based on separate scRNA-seq and scATAC-seq. For cell-type specific networks, you just need to set JOINT=0 in the same config.mk above. For dynamic networks, you additionally need to provide cell coordinates in the same low-dimensional co-embedding.

Let us know if you have any followup or specific questions.

Lingfei

JBreunig commented 1 year ago

Wonderful...thanks for the clarification!