sanger-pathogens / Roary

Rapid large-scale prokaryote pan genome analysis
http://sanger-pathogens.github.io/Roary
Other
314 stars 189 forks source link

Extracting Shell/Cloud gene sequences (DNA sequences) #477

Closed noahaus closed 4 years ago

noahaus commented 4 years ago

Very quick question about how to get the actual DNA sequences from the inferred accessory genome. I would like to extract these sequences to do some blastn analysis and also for metagenomic purposes. How should I structure the roary command to do this? If it is not that simple, what output files should I use in order to find this information? Any help is appreciated!

cwbcm commented 4 years ago

I am not sure if there is a way to do that in roary. But you can extract the DNA sequences from the prokka outputs. You only need gff file from prokka as the input for Roary, but you can find the the DNA sequences (.ffn) in the prokka output folder. Locate the shell/cloud gene that you are interested in from the gene_presence_absence.csv file. Then use the strain name + locus id to grab the DNA sequence.

noahaus commented 4 years ago

Thank you!