sanger-pathogens / Roary

Rapid large-scale prokaryote pan genome analysis
http://sanger-pathogens.github.io/Roary
Other
303 stars 189 forks source link

roary and cd-hit #562

Open zhichusun opened 2 years ago

zhichusun commented 2 years ago

I want to know how to remove redundant genes from pan-genome, I analyzed 100 genomes by roary software and got pan_genome_reference.fa file which contains all core and auxiliary genes, I then use cd-hit-est to remove this file Redundant genes in , how can I remove the analyzed redundant genes from gene_presence_absence.csv afterwards, any help is greatly appreciated

gapabh commented 2 years ago

I need to understand more about CD-HIT and roary, my question is if CD-hit generates the annotation of protein clusters and if yes how does it do that ? what database is used for annotation of protein clusters because it simply creates lot of hypothetical proteins which if i blast against ncbi refseq database , it will show me what that protein is.