Closed tiantianlili closed 11 months ago
Don't do 30 if you're running locally, 1 or 2 was a good number on my laptop since it had four cores and the fan was kind of weak, so the laptop was getting too hot when I got four bowtie's running on it.
If you're running locally, it's counterproductive to set more maxForks than the number of cores on your CPU. The pipeline might start lots of processes, but they won't be advancing as fast, since the kernel will split CPU time between them.
Here's what I would do:
work/ed/29741d
- and find the bowtie2 logs to check that it's runningtop
that shows you CPU load - you should see how well it's using the resourcesbowtie2
job needs to complete, and whether it's a big, medium, or relatively small input fileThe other steps are not computationally demanding, it's just bowtie2, but 400 files might be a lot of work without a server. How big are the files?
Thank you very much for your reply! The workstation configuration of the unit I am using is as follows, and there seem to be a lot of cpus available to me: Processor: Dual Intel Xeon Gold 6230R (2.1GHz, 4.0GHz Turbo,26C, 10.4GT/s 2UPI,35.75MB cache,HT(150W)) DDR4-2933 Memory: 512GB 8x64GB DDR4 2933 RDIMECC Graphics: 2 Dell RTX3090 24GB graphics cards
My total data is about 4.3 Tb
I ran pipline last night with maxForks = 5 and it seems to be going pretty well. I wonder if I can speed up the pipline. I check the directory on the screenshot, but didnt find bowtie2 logs.
I upload a file start with trace. trace-20231210-55261399.txt
Nice, you have 26 cores! I think bowtie2
has the capacity to use multiple cores per job, but I am not sure how many cores it uses when ran by nextflow. You have three pieces there:
If you override bowtie2 command to use a single core, and set maxForks = 15, it could be quite a good config. Or bump maxForks to 10 and don't worry about the details, it should be pretty good as well.
To see the logs, do ls -a instead of ls, the stdout, stderr, and a .sh file reproducing what was actually run, all start with a dot in nextflow.
Finally I see some of your input files are much bigger than others - this is fine, but what you'll see in the pipeline is that some jobs will be fast, and some will take a while to run. If you sort the input files by input size, it might do the biggest ones first and that order of computation will smooth things out at the end of the run, but it's a small tweak.
Thank you very much for your answer and help, pipline is running smoothly, hope there will be good results! Thank you again for developing the wonderful pipline!
How do I set the parameters to run faster locally? Is increasing the number of maxForks effective? Is maxForks the number of threads used? For example: process { maxForks = 30 } I was running over 400 metagenome data samples locally, and I noticed that there didn't seem to be a difference between maxForks = 30 and maxForks = 3, and according to the prompts on the terminal, even not a single sample was completed in one day.