gtonkinhill / panaroo

An updated pipeline for pangenome investigation
MIT License
259 stars 33 forks source link

Pangenome graph has single unconnected COGs #217

Closed nicolettacommins closed 1 year ago

nicolettacommins commented 1 year ago

I ran Panaroo on my dataset of bacterial genome assemblies. When I visualize the graph in Cytoscape I see many COGs that appear as individual nodes unconnected to anything else. How can I interpret these COGs and why are they not filtered by Panaroo as contaminants?

gtonkinhill commented 1 year ago

Hi,

This is a bit unusual but can happen occasionally. Panaroo will only delete individual COGs or short sequences of COGs if they are relatively rare within a population. This is controlled with the --edge_support_threshold parameters and is set to the maximum of either 2 or 0.01 x number of samples in the 'strict' and 'moderate' modes and turned off in the 'sensitive' mode.

I am guessing this is the cause but would need to have a look at the data to be sure. To double-check you could look at where the COGs are appearing in the assemblies and at what frequency within the population.