ohlab / SMEG

Strain-level Metagenomic Estimation of Growth rate (SMEG) measures growth rates of microbial strains from complex metagenomic dataset
20 stars 6 forks source link

Question about Roary output #11

Open huanzhila opened 10 months ago

huanzhila commented 10 months ago

Hi,

Thank you for developing such a really great method and very cool study tool! I was able to run SMEG through singularity installation and changed download_genomes.sh line 4 to "wget ftp://ftp.ncbi.nlm.nih.gov/genomes/.vol2/genbank/bacteria/Akkermansia_muciniphila/assembly_summary.txt", and download Akkermansia_muciniphila 87 strains from NCBI. The followings are outputs from smeg build_species with Akkermansia_muciniphila genome dataset and respectively keep or not keep Roary output. 1)smeg build_species and not keep Roary output $ singularity exec smeg.sif smeg build_species -g akk_new_genomes/ -o akk_new_database -a -p 8 $ ls akk_new_genomes/F.0.6 AM06.fna.core.geneCood.txt cluster13.Input.txt cluster16.Input.txt cluster6.Input.txt clusters.txt core_gene_alignment.aln Index AM06.fna.fna cluster14.Input.txt cluster17.Input.txt cluster7.Input.txt clusters_with_no_unique_SNP.txt core_gene_alignment.aln.fai misc.txt AM06.fna.fna.fai cluster15.Input.txt cluster18.Input.txt clusterOutput.txt core_alignment_header.embl geneCoordinates.txt

I then tried to use the -k option to tell SMEG to keeping Roary output 2)smeg build_species and keep Roary output $ singularity exec smeg.sif smeg build_species -g akk_new_genomes/ -o akk_new_database_Roary -a -p 8 -k $ ls akk_new_database_Roary/F.0.6 AM06.fna.core.geneCood.txt cluster11.Input.txt cluster15.Input.txt cluster1.Input.txt cluster6.Input.txt clusters_with_no_unique_SNP.txt Index AM06.fna.fna cluster12.Input.txt cluster16.Input.txt cluster2.Input.txt cluster9.Input.txt core_alignment_header.embl misc.txt AM06.fna.fna.fai cluster13.Input.txt cluster17.Input.txt cluster3.Input.txt clusterOutput.txt core_gene_alignment.aln cluster10.Input.txt cluster14.Input.txt cluster18.Input.txt cluster5.Input.txt clusters.txt geneCoordinates.txt

Theoretically the results should be the same, but actually different, and keeping Roary output seems yield more clusters.According to the paper, 64 Akkermansia_muciniphila strains were downloaded from the NCBI,this different results also occurs. But for the test data, there is no difference in build species results between keeping and not keeping Roary output.

I wonder if any error caused this difference.Thank you for support. Regards,

La huanzhi

aemiol commented 10 months ago

Hi, There should be the same number of files using both command. What is the content of the log file for the first run?

huanzhila commented 10 months ago

Hi ,

Thank you for your support. Here is the content of the log file for the first run:

[hzla@io02 Akkermansia_muciniphila_new_db]$ cat log.txt 
Number of complete genomes = 87
Number of draft genomes = 0
Selected representative genome is AM06.fna
dnaA position relative to ori is 0.000 

### SNP assignment threshold 0.4 with iterative clustering output ######
Total number of clusters = 10
Median unique SNPs in clusters = 4.5
Could not generate unique SNPs for 9 clusters containing a total of 54 strains
See /data01/user/hzla/akk_new_database/F.0.4/clusters_with_no_unique_SNP.txt for more details
Database created with above parameters located in /data01/user/hzla/akk_new_database/F.0.4 
################################################################################## 

### SNP assignment threshold 0.5 with iterative clustering output ######
Total number of clusters = 10
Median unique SNPs in clusters = 4.5
Could not generate unique SNPs for 9 clusters containing a total of 54 strains
See /data01/user/hzla/akk_new_database/F.0.5/clusters_with_no_unique_SNP.txt for more details
Database created with above parameters located in /data01/user/hzla/akk_new_database/F.0.5 
################################################################################## 

### SNP assignment threshold 0.6 with iterative clustering output ######
Total number of clusters = 8
Median unique SNPs in clusters = 5
Could not generate unique SNPs for 10 clusters containing a total of 61 strains
See /data01/user/hzla/akk_new_database/F.0.6/clusters_with_no_unique_SNP.txt for more details
Database created with above parameters located in /data01/user/hzla/akk_new_database/F.0.6 
################################################################################## 

### SNP assignment threshold 0.7 with iterative clustering output ######
Total number of clusters = 7
Median unique SNPs in clusters = 5
Could not generate unique SNPs for 11 clusters containing a total of 64 strains
See /data01/user/hzla/akk_new_database/F.0.7/clusters_with_no_unique_SNP.txt for more details
Database created with above parameters located in /data01/user/hzla/akk_new_database/F.0.7 
################################################################################## 

### SNP assignment threshold 0.8 with iterative clustering output ######
Total number of clusters = 6
Median unique SNPs in clusters = 6.5
Could not generate unique SNPs for 10 clusters containing a total of 71 strains
See /data01/user/hzla/akk_new_database/F.0.8/clusters_with_no_unique_SNP.txt for more details
Database created with above parameters located in /data01/user/hzla/akk_new_database/F.0.8 
################################################################################## 

### SNP assignment threshold 0.9 with iterative clustering output ######
Total number of clusters = 4
Median unique SNPs in clusters = 35
Could not generate unique SNPs for 10 clusters containing a total of 77 strains
See /data01/user/hzla/akk_new_database/F.0.9/clusters_with_no_unique_SNP.txt for more details
Database created with above parameters located in /data01/user/hzla/akk_new_database/F.0.9 
################################################################################## 

### SNP assignment threshold 0.4 without iterative clustering output ######
Total number of clusters = 8
Median unique SNPs in clusters = 6
Could not generate unique SNPs for 4 clusters containing a total of 21 strains
See T.0.4/clusters_with_no_unique_SNP.txt for more details
Database created with above parameters located in /data01/user/hzla/akk_new_database/T.0.4 
################################################################################## 

### SNP assignment threshold 0.5 without iterative clustering output ######
Total number of clusters = 8
Median unique SNPs in clusters = 6
Could not generate unique SNPs for 4 clusters containing a total of 21 strains
See T.0.5/clusters_with_no_unique_SNP.txt for more details
Database created with above parameters located in /data01/user/hzla/akk_new_database/T.0.5 
################################################################################## 

### SNP assignment threshold 0.6 without iterative clustering output ######
Total number of clusters = 7
Median unique SNPs in clusters = 3
Could not generate unique SNPs for 5 clusters containing a total of 28 strains
See T.0.6/clusters_with_no_unique_SNP.txt for more details
Database created with above parameters located in /data01/user/hzla/akk_new_database/T.0.6 
################################################################################## 

### SNP assignment threshold 0.7 without iterative clustering output ######
Total number of clusters = 0
Median unique SNPs in clusters = 0
Could not generate unique SNPs for 12 clusters containing a total of 87 strains
See T.0.7/clusters_with_no_unique_SNP.txt for more details
Database created with above parameters located in /data01/user/hzla/akk_new_database/T.0.7 
################################################################################## 

### SNP assignment threshold 0.8 without iterative clustering output ######
Total number of clusters = 0
Median unique SNPs in clusters = 0
Could not generate unique SNPs for 12 clusters containing a total of 87 strains
See T.0.8/clusters_with_no_unique_SNP.txt for more details
Database created with above parameters located in /data01/user/hzla/akk_new_database/T.0.8 
################################################################################## 

### SNP assignment threshold 0.9 without iterative clustering output ######
Total number of clusters = 5
Median unique SNPs in clusters = 8
Could not generate unique SNPs for 7 clusters containing a total of 51 strains
See T.0.9/clusters_with_no_unique_SNP.txt for more details
Database created with above parameters located in /data01/user/hzla/akk_new_database/T.0.9 
################################################################################## 

and here is the content of the log file for the second run:

[hzla@io02 Akkermansia_muciniphila_new_db_Roary]$ cat log.txt 
Number of complete genomes = 87
Number of draft genomes = 0
Selected representative genome is AM06.fna
dnaA position relative to ori is 0.000 

### SNP assignment threshold 0.4 with iterative clustering output ######
Total number of clusters = 12
Median unique SNPs in clusters = 9.5
Could not generate unique SNPs for 3 clusters containing a total of 11 strains
See /data01/user/hzla/Akkermansia_muciniphila_new_db_Roary/F.0.4/clusters_with_no_unique_SNP.txt for more details
Database created with above parameters located in /data01/user/hzla/Akkermansia_muciniphila_new_db_Roary/F.0.4 
################################################################################## 

### SNP assignment threshold 0.5 with iterative clustering output ######
Total number of clusters = 15
Median unique SNPs in clusters = 11
Could not generate unique SNPs for 3 clusters containing a total of 11 strains
See /data01/user/hzla/Akkermansia_muciniphila_new_db_Roary/F.0.5/clusters_with_no_unique_SNP.txt for more details
Database created with above parameters located in /data01/user/hzla/Akkermansia_muciniphila_new_db_Roary/F.0.5 
################################################################################## 

### SNP assignment threshold 0.6 with iterative clustering output ######
Total number of clusters = 15
Median unique SNPs in clusters = 7
Could not generate unique SNPs for 3 clusters containing a total of 11 strains
See /data01/user/hzla/Akkermansia_muciniphila_new_db_Roary/F.0.6/clusters_with_no_unique_SNP.txt for more details
Database created with above parameters located in /data01/user/hzla/Akkermansia_muciniphila_new_db_Roary/F.0.6 
################################################################################## 

### SNP assignment threshold 0.7 with iterative clustering output ######
Total number of clusters = 15
Median unique SNPs in clusters = 7
Could not generate unique SNPs for 3 clusters containing a total of 11 strains
See /data01/user/hzla/Akkermansia_muciniphila_new_db_Roary/F.0.7/clusters_with_no_unique_SNP.txt for more details
Database created with above parameters located in /data01/user/hzla/Akkermansia_muciniphila_new_db_Roary/F.0.7 
################################################################################## 

### SNP assignment threshold 0.8 with iterative clustering output ######
Total number of clusters = 15
Median unique SNPs in clusters = 7
Could not generate unique SNPs for 3 clusters containing a total of 11 strains
See /data01/user/hzla/Akkermansia_muciniphila_new_db_Roary/F.0.8/clusters_with_no_unique_SNP.txt for more details
Database created with above parameters located in /data01/user/hzla/Akkermansia_muciniphila_new_db_Roary/F.0.8 
################################################################################## 

### SNP assignment threshold 0.9 with iterative clustering output ######
Total number of clusters = 9
Median unique SNPs in clusters = 7
Could not generate unique SNPs for 5 clusters containing a total of 31 strains
See /data01/user/hzla/Akkermansia_muciniphila_new_db_Roary/F.0.9/clusters_with_no_unique_SNP.txt for more details
Database created with above parameters located in /data01/user/hzla/Akkermansia_muciniphila_new_db_Roary/F.0.9 
################################################################################## 

### SNP assignment threshold 0.4 without iterative clustering output ######
Total number of clusters = 7
Median unique SNPs in clusters = 55
Could not generate unique SNPs for 0 clusters containing a total of 0 strains
See T.0.4/clusters_with_no_unique_SNP.txt for more details
Database created with above parameters located in /data01/user/hzla/Akkermansia_muciniphila_new_db_Roary/T.0.4 
################################################################################## 

### SNP assignment threshold 0.5 without iterative clustering output ######
Total number of clusters = 7
Median unique SNPs in clusters = 33
Could not generate unique SNPs for 0 clusters containing a total of 0 strains
See T.0.5/clusters_with_no_unique_SNP.txt for more details
Database created with above parameters located in /data01/user/hzla/Akkermansia_muciniphila_new_db_Roary/T.0.5 
################################################################################## 

### SNP assignment threshold 0.6 without iterative clustering output ######
Total number of clusters = 7
Median unique SNPs in clusters = 19
Could not generate unique SNPs for 0 clusters containing a total of 0 strains
See T.0.6/clusters_with_no_unique_SNP.txt for more details
Database created with above parameters located in /data01/user/hzla/Akkermansia_muciniphila_new_db_Roary/T.0.6 
################################################################################## 

### SNP assignment threshold 0.7 without iterative clustering output ######
Total number of clusters = 7
Median unique SNPs in clusters = 19
Could not generate unique SNPs for 0 clusters containing a total of 0 strains
See T.0.7/clusters_with_no_unique_SNP.txt for more details
Database created with above parameters located in /data01/user/hzla/Akkermansia_muciniphila_new_db_Roary/T.0.7 
################################################################################## 

### SNP assignment threshold 0.8 without iterative clustering output ######
Total number of clusters = 7
Median unique SNPs in clusters = 12
Could not generate unique SNPs for 0 clusters containing a total of 0 strains
See T.0.8/clusters_with_no_unique_SNP.txt for more details
Database created with above parameters located in /data01/user/hzla/Akkermansia_muciniphila_new_db_Roary/T.0.8 
################################################################################## 

### SNP assignment threshold 0.9 without iterative clustering output ######
Total number of clusters = 6
Median unique SNPs in clusters = 35.5
Could not generate unique SNPs for 1 clusters containing a total of 12 strains
See T.0.9/clusters_with_no_unique_SNP.txt for more details
Database created with above parameters located in /data01/user/hzla/Akkermansia_muciniphila_new_db_Roary/T.0.9 
##################################################################################

I wonder if any error caused this difference in F.0.6. Thank you for support. Regards,

La huanzhi

aemiol commented 10 months ago

I'm not sure exactly the reason for the discrepancy. However, the 2nd output looks more reasonable and looks like what you'd expect