Request for Assistance in Identifying Core Genome from Result Files

2021JohnSheng commented 7 months ago

Hi,

I would like to express my sincere gratitude for maintaining the Panaroo tool. Your dedication to this project is greatly appreciated.

I am currently working with over 3000 genomes and aiming to identify the core genome. However, during the initial run, I failed to include the --alignment parameter. As a result, the output files generated from the software include: combined_protein_CDS.fasta combined_DNA_CDS.fasta gene_presence_absence.Rtab gene_presence_absence_roary.csv gene_presence_absence.csv ... I am seeking guidance on whether it is possible to extract genes that are present in all the genomes from these files. Due to the large number of genomes being analyzed, I am hoping to avoid rerunning the program. Warm regards

nzmacalasdair commented 7 months ago

Hello,

It is absolutely possible to align the core genome after initially running panaroo, the panaroo-msa command is what does this.

Unfortunately, we have had a persistent issue where this function is not installed properly by conda.

If you have installed panaroo with conda (as recommended) you can run panaroo-msa manually by cloning (downloading) this github repo, and then, with the panaroo conda environment active, running the panaroo-msa-runner.py script in the root directory of this repository with python panaroo-msa-runner.py.

panaroo-msa takes the output directory of panaroo as input, and will give you the option to align just the core genome, or all pan-genome genes.

I'm guessing that your interest in extracting gene sequences is just about aligning them -- if you'd like to extract gene sequences for some other purpose, there are some other ways of doing so which may be more suited to your needs.

Let me know if you run into any issues with this!

2021JohnSheng commented 7 months ago

Hello,

Thank you for the prompt and informative response. I appreciate the guidance on using panaroo-msa for core genome alignment and the workaround for the conda installation issue. I'll proceed as advised and reach out if I encounter any challenges.

gtonkinhill / panaroo

Request for Assistance in Identifying Core Genome from Result Files #289