sanger-pathogens / Roary

Rapid large-scale prokaryote pan genome analysis
http://sanger-pathogens.github.io/Roary
Other
323 stars 189 forks source link

Number of accessory genes is not consistent between different result files #335

Closed alantsangmb closed 7 years ago

alantsangmb commented 7 years ago

Hi, I am analyzing 28 bacterial genomes using Roary.

And the summary statistics is as follow: Ortholog class Definition Count Core genes (99% <= strains <= 100%) 4930 Soft core genes (95% <= strains < 99%) 135 Shell genes (15% <= strains < 95%) 140 Cloud genes (0% <= strains < 15%) 163 Total genes (0% <= strains <= 100%) 5368

So there should be 438 accessory genes but the length of each sequence in accessory_binary_genes.fa is only 164 characters. And there are 383 sets of "variation" listed in the accessory.tab file.

andrewjpage commented 7 years ago

Your best to ignore the accessory_binary_genes.fa file. It is just for creating a quick and dirty tree with FastTree. The file itself is filtered to remove very common and not common variation to speedup the tree generation, hence the difference in numbers.

alantsangmb commented 7 years ago

Thank you Andrew! I think the tree is still informative to me even it has been filtered.

Might I know the threshold for the very common and not common?

TreeT2 commented 7 years ago

Sorry for the message on a closed thread. But is there a way to generate an accurate accessory gene binary dataset?

andrewjpage commented 7 years ago

The top 5% and bottom 5% are excluded. It is truncated at 4000 genes. The code is here: https://github.com/sanger-pathogens/Roary/blob/master/lib/Bio/Roary/AccessoryBinaryFasta.pm

There are no parameters to modify this, and since its rough and ready tree to give you an idea about how things cluster. Even if you were to use all the accessory genes, it would still be quite inaccurate. If it really is something you want to do you can filter the gene presence and absense Rtab file, and build a tree from that ( since it will have all the data).

TreeT2 commented 7 years ago

Great thanks Andrew.

alantsangmb commented 7 years ago

Thanks!