nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
322 stars 87 forks source link

why use only 8 threads in diamond #704

Open zsdxgl opened 2 years ago

zsdxgl commented 2 years ago

Are you using the latest release? yes

If you are not using the latest release of funannotate, please upgrade, if bug persists then report here.

Describe the bug A clear and concise description of what the bug is. I find a limit for CPU in funannotate-p2g.py

if int(cpus) > 8: cpus = 8

Why? Can I modify to 100 ? What command did you issue? Copy/paste the command used.

Logfiles Please provide relavent log files of the error.

OS/Install Information

nextgenusfs commented 2 years ago

It was put there to avoid large RAM usage if not running in a newer version of diamond (>2.0.5 runs in frameshift mode which does not have as large of an impact on RAM usage). But I think in the original method it needs to load the entire genome into memory for each thread/cpu. So this limit was put it in to avoid malloc/mem errors. You can certainly modify it on your own if you have the resources to run it. Although Im not sure it will significantly improve speed....

zsdxgl commented 2 years ago

Dear nextgenusfs,

It seems that you have update the number of the parameter CPU to 30. Is it possible that part of the progresses of diamond were killed without "warning". This leads to a resuts that the predicted protein is less that the expected.

Best