nmquijada / tormes

Making whole bacterial genome sequencing data analysis easy
GNU General Public License v3.0
99 stars 32 forks source link

Pangenome analysis was not performed. #65

Closed ariasamin closed 1 year ago

ariasamin commented 1 year ago

Hello, I did pan-genome analysis with tormes before and I got the output. Now I reran the tormes with an updated date, however the tormes_report.html says: "Pangenome analysis was not performed.", although there are output in pangenome directory: l pangenome total 662M -rw-r--r-- 1 ** cuuser 60 Mar 9 01:48 blast_identity_frequency.Rtab -rw-r--r-- 1 ** cuuser 69M Mar 9 01:57 clustered_proteins -rw-r--r-- 1 ** cuuser 4.5M Mar 9 02:06 accessory_binary_genes.fa -rw-r--r-- 1 ** cuuser 56K Mar 9 02:07 accessory_binary_genes.fa.newick -rw-r--r-- 1 ** cuuser 3.0M Mar 9 02:15 accessory_graph.dot -rw-r--r-- 1 ** cuuser 3.4M Mar 9 02:15 core_accessory_graph.dot -rw-r--r-- 1 ** cuuser 149M Mar 9 02:16 gene_presence_absence.csv -rw-r--r-- 1 ** cuuser 55M Mar 9 02:17 gene_presence_absence.Rtab -rw-r--r-- 1 ** cuuser 205 Mar 9 02:20 summary_statistics.txt -rw-r--r-- 1 ** cuuser 61K Mar 9 02:22 number_of_unique_genes.Rtab -rw-r--r-- 1 ** cuuser 29K Mar 9 02:22 number_of_new_genes.Rtab -rw-r--r-- 1 ** cuuser 72K Mar 9 02:22 number_of_genes_in_pan_genome.Rtab -rw-r--r-- 1 ** cuuser 49K Mar 9 02:22 number_of_conserved_genes.Rtab -rw-r--r-- 1 ** cuuser 52M Mar 9 02:22 core_accessory.tab -rw-r--r-- 1 ** cuuser 3.1M Mar 9 02:22 core_accessory.header.embl -rw-r--r-- 1 ** cuuser 37M Mar 9 02:23 accessory.tab -rw-r--r-- 1 ** cuuser 3.0M Mar 9 02:23 accessory.header.embl -rw-r--r-- 1 ** cuuser 17M Mar 9 03:05 pan_genome_reference.fa -rw-r--r-- 1 ** cuuser 83K Mar 9 03:19 core_alignment_header.embl -rw-r--r-- 1 ** cuuser 572M Mar 9 03:19 core_gene_alignment.aln -rw-r--r-- 1 ** cuuser 55K Mar 9 07:11 core_gene_alignment.newick drwxr-xr-x 2 ** cuuser 22 Mar 9 07:11 . -rw-r--r-- 1 ** cuuser 351M Mar 9 07:14 pangenome.svg drwxr-xr-x 10 ** cuuser 13 Mar 9 11:47 ..

Picture1

What I am looking for is the "Pangenome genes summary" and "Percent of pangenome genes" I got before in tormes_report.html. The question is where I can find the following statistics:

Picture2

Thank you!

biobrad commented 1 year ago

Hey there,

Can you please check if the 'summary_statistics' file is available in the report_files.tgz file that should be in your tormes analysis output folder?

You can open that file with the following command:

tar -xvzf tormes_report.tgz

if it is there, with the tormes environment activated, try running:

./render_report.sh

This should create another tormes report in the same folder.

let me know how you get on.

cheers Brad

nmquijada commented 1 year ago

Hi @ariasamin

Just to check if I understood you correctly. Did you run TORMES with the same dataset twice and for one you got pangenome results and for the other if failed?

You can find the files you are looking for either in tormes_report.tgz, as Brad suggested, or in the pangenome/ directory. The file you are looking for is called summary_statistics.txt

Best, Narciso

ariasamin commented 1 year ago

Dear Brad, Dear Narsico

Thank you for getting back to me. I found the "summary_statistics" file, and I also ran the ./render_report.sh. Unfortunately, I got the same output, saying: "Pangenome analysis was not performed." However, as I had the statistics, I plotted the statistics on my own.

The second run had more genomes added to the first run input (1236 genomes).

All the Best,

biobrad commented 1 year ago

Woah... that is a lot of genomes.

I wonder if potentially it hit roary's cluster limit: from Roary's page: 'No core alignment is produced and theres an error about too many clusters? By default if there are more than 50,000 clusters, Roary will not create the core alignment.'

Glad you were able to get it sorted out yourself.