sanger-pathogens / Roary

Rapid large-scale prokaryote pan genome analysis
http://sanger-pathogens.github.io/Roary
Other
303 stars 189 forks source link

Creating gene presence/absence of novel query against existing dataset #560

Open annabelledamerum opened 2 years ago

annabelledamerum commented 2 years ago

Hello Roary developer team,

I have a dataset of ~3000 isolates and I have completed Roary analyses on this dataset and I have a gene presence/absence matrix.

I now have a new isolate and my goal is to retrieve the gene presence/absence profile of this isolate based on the genes identified in the first analyses of 3,000 isolates, without having to re-run the Roary analysis on all samples combined.

Is there a way to do this currently in Roary? I am thinking I could use the pan_genome_reference.fa file generated in the first analyses of 3,000 isolates and perform a blastn of the annotated genome of the new isolate (perhaps the clustered fasta .fnn file generated by Prokka?) and somehow convert the blast results into gene presence/absence. As a novice bioinformatician figuring this out will take me some time, so I wanted to first check that Roary doesn't already have this utility and secondly if not, ask this expert community for their advice in case there is a more simplified way of achieving my goal.

Thank you in advance for your time Annabelle