Any Tips on Making a Pangenome on Genetically Disparate Species?

Hi,

I've had this error quite a few times in the past, it is possible to overcome it, but it depends on why it is happening.

How many genomes do you have exactly? And how many clusters to you get at the end of the clustering step? If you do not have this last information, you can get it by running:

ppanggolin info -p pangenome.h5 --content

On the pangenome.h5 file that should have been generated.

For very distantly related genomes (i.e. same genus, or same family) I would recommend lowering the identity threshold for the clustering step, as the default ones are set for relatively closely related genomes. It is likely that your clusters are very sparse and your presence absence table might not be so useful in that case.

For that, you will need to run the tool 'step by step' rather than using the 'workflow' or 'panrgp' commands. For the clustering step, you can check this page: https://github.com/labgem/PPanGGOLiN/wiki/PPanGGOLiN---step-by-step-pangenome-analysis#clustering You'll need to use a command like this on your current pangenome:

ppanggolin cluster -p pangenome.h5 --identity 0.5

For example, if you which to build clusters with an identity threshold at 50%. For genomes belonging to the same family (as in, taxonomy level family) it might be a bit too high, though.

If all you need is the presence absence table without any partitions, or graph, or genomic islands predictions or whatever, you should be able to write it directly after the clustering step, using the following commands:

ppanggolin write -p pangenome.h5 --csv

ppanggolin write -p pangenome.h5 --Rtab

depending on whether you wanted the rtab file or the csv file. If you do not know, you can check their formats through this page: https://github.com/labgem/PPanGGOLiN/wiki/Outputs#gene-presence-absence

Don't hesitate to tell me if you need any other information!

Adelme

labgem / PPanGGOLiN

Any Tips on Making a Pangenome on Genetically Disparate Species? #67