pacific-2020 / pacific

PACIFIC: A lightweight alignment-free deep-learning classifier of SARS-CoV-2 and co-infecting viral sequences
MIT License
5 stars 2 forks source link

Support for multi-threading/using multiple cores #9

Open VGalata opened 3 years ago

VGalata commented 3 years ago

Hello!

Does the tool support the use of multiple cores? I could not find any information about it in the manual. Some of my samples seem to need more than 4 hours and I was wondering if I could speed up the analysis.

Thank you!

Best, Valentina

hp2048 commented 3 years ago

Hi Valentina Tensorflow uses multiple cores underneath. We haven't implemented the control over the usage of cores by Tensorflow so it ends up using what is available. If you have access to multiple machines, you can perhaps look to embarrassingly parallel run where you can split your input data into multiple files and run them all independently. Speed is currently one of the bottlenecks. However, depending on the user base, we may look to put more hours into it to speed it up.

PS: I would love to see your results for us to understand how the tool works for new data. If possible, could you please paste the distribution of reads into each class from your run? This is generally displayed on the terminal screen at the end of the run. Hardip

VGalata commented 3 years ago

Hi Hardip,

Thanks for the pointer! I forgot to check how many cores are actually used when reserving multiple cores for the jobs. I have a snakemake pipeline and the jobs are submitted to our HPC cluster via slurm. After reserving 10 cores for a job I can see that more than 10 are being actively used. Having more direct control over the number of cores would be great.

Regarding the results on our data: I am not sure how useful they will be because until now we have not seen many SARS-CoV-2 reads in our samples, but we are also still producing and preprocessing the sequencing data. Currently, the plan is to use PACIFIC and fastv as a quality control step to identify samples which contain some reads from the virus. I would be happy to share with you our results - I will come back to you when we have finished the analysis.