I have a dataset of ~3000 isolates and I have completed Roary analyses on this dataset and I have a gene presence/absence matrix.
I now have a new isolate and my goal is to retrieve the gene presence/absence profile of this isolate based on the genes identified in the first analyses of 3,000 isolates, without having to re-run the Roary analysis on all samples combined.
Is there a way to do this currently in Roary?
I am thinking I could use the pan_genome_reference.fa file generated in the first analyses of 3,000 isolates and perform a blastn of the annotated genome of the new isolate (perhaps the clustered fasta .fnn file generated by Prokka?) and somehow convert the blast results into gene presence/absence.
As a novice bioinformatician figuring this out will take me some time, so I wanted to first check that Roary doesn't already have this utility and secondly if not, ask this expert community for their advice in case there is a more simplified way of achieving my goal.
Hello Roary developer team,
I have a dataset of ~3000 isolates and I have completed Roary analyses on this dataset and I have a gene presence/absence matrix.
I now have a new isolate and my goal is to retrieve the gene presence/absence profile of this isolate based on the genes identified in the first analyses of 3,000 isolates, without having to re-run the Roary analysis on all samples combined.
Is there a way to do this currently in Roary? I am thinking I could use the pan_genome_reference.fa file generated in the first analyses of 3,000 isolates and perform a blastn of the annotated genome of the new isolate (perhaps the clustered fasta .fnn file generated by Prokka?) and somehow convert the blast results into gene presence/absence. As a novice bioinformatician figuring this out will take me some time, so I wanted to first check that Roary doesn't already have this utility and secondly if not, ask this expert community for their advice in case there is a more simplified way of achieving my goal.
Thank you in advance for your time Annabelle