ziyewang / MetaBinner

GNU General Public License v3.0
48 stars 6 forks source link

Binning finishes successfully but with errors #9

Closed Chrisjrt closed 2 years ago

Chrisjrt commented 2 years ago

Hi,

I'm running metabinner on a set of assemblies and it works great on all of them except one assembly. For that assembly it shows the Binning Finished! message, but I can see that above it there are some error messages. This is a snippet of the end of the output messages:

2021-12-08 14:36:04,589 - The binning result file to be handled:        /data/san/data0/users/chris/binning/data/processed/bins/random/
metabinner/spherical//metabinner_res/intermediate_result/partial_seed_kmeans_bacar_marker_seed_length_weight_3quarter_X_cov_logtrans_result.tsv_bins_post_proce
ss_mincomp_70_mincont_50_bins                                                                                                                                  
2021-12-08 14:36:04,589 - The number of threads:        10                                                                                                     
2021-12-08 14:36:04,639 - The number of contigs:        2203                                                                                                   
partial_seed_kmeans_bacar_marker_seed_length_weight_3quarter_X_t_logtrans_result.tsv                                                                           
/data/san/data0/users/chris/binning/.snakemake/conda/11cc0ac87d0563d6b0f17d15c2dfb3b9/lib/python3.7/site-packages/sklearn/utils/depreca
tion.py:144: FutureWarning: The sklearn.cluster.k_means_ module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes /
 functions should instead be imported from sklearn.cluster. Anything that cannot be imported from sklearn.cluster is now part of the private API.              
  warnings.warn(message, FutureWarning)                                                                                                                        
2021-12-08 14:36:05,325 - Input arguments:                                                                                                                     
2021-12-08 14:36:05,325 - Contig_file:  /data/san/data0/users/chris/binning/data/processed/assemblies/random/spherical/assembly_1000.fa
2021-12-08 14:36:05,325 - Coverage_profiles:    /data/san/data0/users/chris/binning/data/processed/assemblies/random/spherical/depth1kb
.tsv                                                                                                                                                           
2021-12-08 14:36:05,325 - Composition_profiles: /data/san/data0/users/chris/binning/data/processed/assemblies/random/spherical/kmer_4_f
1000.csv                                                                                                                                                       
2021-12-08 14:36:05,325 - The binning result file to be handled:        /data/san/data0/users/chris/binning/data/processed/bins/random/
metabinner/spherical//metabinner_res/intermediate_result/partial_seed_kmeans_bacar_marker_seed_length_weight_3quarter_X_t_logtrans_result.tsv_bins_post_process
_mincomp_70_mincont_50_bins                                                                                                                                    
2021-12-08 14:36:05,325 - The number of threads:        10                                                                                                     
2021-12-08 14:36:05,375 - The number of contigs:        2203                                                                                                   
Processing 9 genomes from kmeans_length_weight_X_t_logtrans_result.tsv with extension 'fa'.                                                                    
Processing 9 genomes from partial_seed_kmeans_bacar_marker_seed_length_weight_1quarter_X_t_logtrans_result.tsv with extension 'fa'.                            
Processing 9 genomes from partial_seed_kmeans_bacar_marker_seed_length_weight_2quarter_X_t_logtrans_result.tsv with extension 'fa'.
Processing 9 genomes from partial_seed_kmeans_bacar_marker_seed_length_weight_3quarter_X_t_logtrans_result.tsv with extension 'fa'.                            
Processing 24 genomes from kmeans_length_weight_X_cov_logtrans_result.tsv with extension 'fa'.                                                                 
Processing 19 genomes from partial_seed_kmeans_bacar_marker_seed_length_weight_1quarter_X_cov_logtrans_result.tsv with extension 'fa'.                         
Processing 19 genomes from partial_seed_kmeans_bacar_marker_seed_length_weight_2quarter_X_cov_logtrans_result.tsv with extension 'fa'.                         
Processing 19 genomes from partial_seed_kmeans_bacar_marker_seed_length_weight_3quarter_X_cov_logtrans_result.tsv with extension 'fa'.                         
Processing 9 genomes from kmeans_length_weight_X_com_logtrans_result.tsv with extension 'fa'.                                                                  
Processing 9 genomes from partial_seed_kmeans_bacar_marker_seed_length_weight_1quarter_X_com_logtrans_result.tsv with extension 'fa'.                          
Processing 9 genomes from partial_seed_kmeans_bacar_marker_seed_length_weight_2quarter_X_com_logtrans_result.tsv with extension 'fa'.                          
Processing 9 genomes from partial_seed_kmeans_bacar_marker_seed_length_weight_3quarter_X_com_logtrans_result.tsv with extension 'fa'.                          
bin_dir:        /data/san/data0/users/chris/binning/data/processed/bins/random/metabinner/spherical//metabinner_res/ensemble_res/X_t_lo
gtrans_2postprocess/greedy_cont_weight_3_mincomp_50.0_maxcont_15.0_bins                                                                                        
Get initial quality of bins.                                                                                                                                   
bin_dir:        /data/san/data0/users/chris/binning/data/processed/bins/random/metabinner/spherical//metabinner_res/ensemble_res/X_cov_
logtrans_2postprocess/greedy_cont_weight_3_mincomp_50.0_maxcont_15.0_bins                                                                                      
Get initial quality of bins.                                                                                                                                   
bin_dir:        /data/san/data0/users/chris/binning/data/processed/bins/random/metabinner/spherical//metabinner_res/ensemble_res/X_com_
logtrans_2postprocess/greedy_cont_weight_3_mincomp_50.0_maxcont_15.0_bins                                                                                      
Get initial quality of bins.                                                                                                                                   
Selected 2065 from partial_seed_kmeans_bacar_marker_seed_length_weight_2quarter_X_t_logtrans_result.tsv with quality = 98.3 (comp. = 98.3%, cont. = 0.0%).     
Selected 2065 from partial_seed_kmeans_bacar_marker_seed_length_weight_2quarter_X_com_logtrans_result.tsv with quality = 98.3 (comp. = 98.3%, cont. = 0.0%).   
Selected 435 from partial_seed_kmeans_bacar_marker_seed_length_weight_3quarter_X_t_logtrans_result.tsv with quality = 98.3 (comp. = 98.3%, cont. = 0.0%).      
Selected 430 from kmeans_length_weight_X_com_logtrans_result.tsv with quality = 98.3 (comp. = 98.3%, cont. = 0.0%).                                            
Selected 589 from partial_seed_kmeans_bacar_marker_seed_length_weight_3quarter_X_t_logtrans_result.tsv with quality = 97.9 (comp. = 97.9%, cont. = 0.0%).      
Selected 589 from partial_seed_kmeans_bacar_marker_seed_length_weight_3quarter_X_com_logtrans_result.tsv with quality = 97.9 (comp. = 97.9%, cont. = 0.0%).    
Selected 1893 from partial_seed_kmeans_bacar_marker_seed_length_weight_3quarter_X_t_logtrans_result.tsv with quality = 97.4 (comp. = 100.0%, cont. = 0.9%).
Selected 1782 from partial_seed_kmeans_bacar_marker_seed_length_weight_2quarter_X_t_logtrans_result.tsv with quality = 97.3 (comp. = 97.3%, cont. = 0.0%).
Selected 1700 from kmeans_length_weight_X_com_logtrans_result.tsv with quality = 97.4 (comp. = 100.0%, cont. = 0.9%).                                          
Selected 1705 from partial_seed_kmeans_bacar_marker_seed_length_weight_3quarter_X_t_logtrans_result.tsv with quality = 96.6 (comp. = 96.6%, cont. = 0.0%).
Selected 1782 from partial_seed_kmeans_bacar_marker_seed_length_weight_2quarter_X_com_logtrans_result.tsv with quality = 97.3 (comp. = 97.3%, cont. = 0.0%).
Selected 455 from partial_seed_kmeans_bacar_marker_seed_length_weight_1quarter_X_t_logtrans_result.tsv with quality = 69.4 (comp. = 69.4%, cont. = 0.0%).
Selected 1705 from partial_seed_kmeans_bacar_marker_seed_length_weight_3quarter_X_com_logtrans_result.tsv with quality = 96.6 (comp. = 96.6%, cont. = 0.0%).
Selected 455 from partial_seed_kmeans_bacar_marker_seed_length_weight_1quarter_X_com_logtrans_result.tsv with quality = 69.4 (comp. = 69.4%, cont. = 0.0%).
mv: cannot stat 'Refined_ABC/Refined': No such file or directory                                                                                               
mv: cannot stat 'Refined_AB/Refined': No such file or directory                                                                                                
mv: cannot stat 'Refined_BC/Refined': No such file or directory                                                                                                
Processing 7 genomes from X_t_logtrans with extension 'fna'.                                                                                                   
No bins identified for X_cov_logtrans in /data/san/data0/users/chris/binning/data/processed/bins/random/metabinner/spherical//metabinne
r_res/ensemble_res/X_cov_logtrans_2postprocess/greedy_cont_weight_3_mincomp_50.0_maxcont_15.0_bins.                                                            
Processing 7 genomes from X_com_logtrans with extension 'fna'.                                                                                                 
Input directory does not exists: /data/san/data0/users/chris/binning/data/processed/bins/random/metabinner/spherical//metabinner_res/en
semble_res/greedy_cont_weight_3_mincomp_50.0_maxcont_15.0_bins/ensemble_3logtrans/Refined_ABC/Refined_ABC                                                      

cp: cannot stat '/data/san/data0/users/chris/binning/data/processed/bins/random/metabinner/spherical//metabinner_res/ensemble_res/greed
y_cont_weight_3_mincomp_50.0_maxcont_15.0_bins/ensemble_3logtrans/addrefined2and3comps/greedy_cont_weight_3_mincomp_50.0_maxcont_15.0_bins_res.tsv': No such f$
le or directory                        
Binning Finished! 

I'm just wondering if this is normal and just metabinners way of saying that it couldn't find any bins or if something else is going on?

Thanks,

Chris

ziyewang commented 2 years ago

Hi, Chris,

Could you please tell me the number of samples of your dataset? It seems that there are no quality bins generated using the coverage information only, which makes it fail to finish the second-stage ensemble process. As stated in the manuscript (https://www.biorxiv.org/content/10.1101/2021.07.25.453671v1.full.pdf), some component binning results for integration are generated using coverage information alone as features, and we recommend applying MetaBinner to multi-sample datasets. MetaWRAP is a good choice for small-scale datasets or datasets with few samples. Or maybe the ensemble results generated by the first-stage ensemble process of MetaBinner using the combined features is fine. You can find them in "../metabinner_res/ensemble_res/X_t_logtrans_2postprocess/greedy_cont_weight_3_mincomp_50.0_maxcont_15.0_bins".

Thanks for your report, and we will improve the error reminder. Please feel free to let us know if you have any further questions.

Best wishes, Ziye

Chrisjrt commented 2 years ago

Hi Ziye,

Thanks for the quick reply! It was only one little mock community I was messing around with, so that will explain it then.

Thanks again for the help!

Chris

ThijsSt commented 2 years ago

Hey, I'm running into the same problem, but was wondering: is there a minimum sample size you recommend? Or a lower limit beyond which you recommend switching to a program/pipeline?

jsgounot commented 2 years ago

Hi. On my side MetaBinner performs very poorly on low diversity metagenomes. Additionally to previous comments, how do you create multi-samples dataset? Are we supposed to merge bamfiles ? This is very painful to do.

ziyewang commented 2 years ago

Hey, I'm running into the same problem, but was wondering: is there a minimum sample size you recommend? Or a lower limit beyond which you recommend switching to a program/pipeline?

Hi,

I recommend running MetaBinner on the datasets with no less than ten samples, but there isn't an apparent minimum sample size. The performance may be influenced by the sample size, the datasets' complexity, the assembly's quality (e.g. assembly contiguity), and so on. Sorry for not replying in time.

Best, Ziye

ziyewang commented 2 years ago

Hi. On my side MetaBinner performs very poorly on low diversity metagenomes. Additionally to previous comments, how do you create multi-samples dataset? Are we supposed to merge bamfiles ? This is very painful to do.

Hi,

The binner was not developed to handle the low diversity metagenomes; understandably, it didn't perform well on the low diversity metagenomes. But we hope that MetaBinner will also perform well on such datasets. If the low diversity metagenomes you used are publicly available, could you please send us the link to the metagenomes and let us figure it out. Thanks very much.

We align the reads from each sequencing sample against the assembly file to generate multiple bam files, calculate depth (coverage) for each bam file and merge the outputs. The process is similar to that of most binners.

Best, Ziye