lmrodriguezr / nonpareil

Estimate metagenomic coverage and sequence diversity
http://enve-omics.ce.gatech.edu/nonpareil/
Other
42 stars 11 forks source link

How to set -X and -n #44

Closed Thomieh73 closed 3 years ago

Thomieh73 commented 3 years ago

Hi, I was wondering if there is any need to deviate from the default values for -X and -n. I saw that -X is should be 10 x less than the complete dataset. So all my datasets have at least 10 milion reads, Would it then be right to set -X at 1 million reads. What would be the advantage for the calculations. I noticed that takes quite a bit longer to run the job.

The same for -n, it is by default at 1024. Could you think of an argument why I would want to increase that number, say to 2048...

lmrodriguezr commented 3 years ago

Hello @Thomieh73

In general, there is no need to change them.

Increasing -X would increase precision and the reproducibility between runs on the same sample, but it would also increase the running time significantly. In our experience, the default values (1k for alignment, 10k for kmer) generally perform very well with real data of any sequence diversity. In the case of very large metagenomes from communities with extremely high diversity (e.g., marine or soil communities) it might be worth evaluating an increase in -X, but even in such samples we've seen the default values perform well.

Regarding -n, there is no real need to increase it, regardless of sequence diversity or dataset size. This parameter exists mainly to allow fine-tuning of internal parameters.

Thomieh73 commented 3 years ago

Thanks for the explanation. Than I will stick with the default parameters.