Open lfdelzam opened 1 month ago
Hello,
The msa
command begins by selecting gene families. When you use the --partition core
argument, it will only select families that are present in all genomes. For each of these selected families, the command will then consider only the genes that are single-copy in their respective genomes. So, If a family contains multiple genes within a genome, these genes will be excluded from the MSA.
I hope this explanation clarifies the behavior of the command.
So, the non single copy genes within the family or the entire family of genes are removed?
It is the non single copy genes within the family that are removed.
Thank you for developing such a great program! I've been using ppanggolin msa and it's been helpful. Here is the command I've been working with:
ppanggolin msa -p Pangenome_graph/pangenome.h5 --partition core --source dna -o Pangenome_graph/MSA --phylo -c 20 -f
The documentation mentions: "By default it will write the strict 'core' (genes that are present in absolutely all genomes) and remove any duplicated genes."
I'm curious to learn more about how this workflow operates, especially regarding the removal of duplicated genes. Could you please provide more details on what exactly this entails? Specifically, does the program select one copy of the duplicated genes or does it only use single copy genes?
Thanks in advance