Closed henrivkgt closed 2 years ago
Hi @henrivkgt,
those -n
s are not the same parameter. In the first example (https://pggb.readthedocs.io/en/latest/rst/tutorials/sequence_partitioning.html), -n
refers to a parameter of wfmash
(the sequence aligner we use in pggb
). Instead, in the second example (https://pggb.readthedocs.io/en/latest/rst/quick_start.html), -n
refers to a parameter of pggb
.
Since they have the same name, I wonder if we should make the handling of these -n
s the same from the outside (hiding the -1 thing) to avoid other confusion in the future.
Thank you, that makes sense.
This is a bad documentation bug. The tutorial isn't in sync with the code. pggb
's help text also doesn't explain that this should be set equal to the number of expected homologous haplotypes within the pangenome.
The way to use -n
is that it is equal to the number of haplotypes that you expect in your sample. For instance, if you had N=10 diploid genomes as input, you'd expect (typically) to see 2N=20 homologous copies of each locus. In this case, we should run pggb -n 20
. If we just have 10 sequences, or 10 haploid genomes, we'd run pggb -n 10
.
Hello,
I am trying to run the pggb tool on a set of six cucumber genomes. One thing that is not completely clear to me is how to set the -n parameter. In one doc page (https://pggb.readthedocs.io/en/latest/rst/tutorials/sequence_partitioning.html) a set of 7 yeast genomes is used, with -n representing the number of mappings per locus (which is 6, or the number of genomes minus one). In others, such as the quick start (https://pggb.readthedocs.io/en/latest/rst/quick_start.html) it seems the -n is set to the number of genomes (so not minus one).
Would you be able to clear this up to me?
Thanks in advance, Henri