ziyewang / MetaBinner

GNU General Public License v3.0
48 stars 6 forks source link

markerCMD failed #38

Closed lxsteiner closed 4 months ago

lxsteiner commented 6 months ago

Hi, thank you for the tool!

I've had it run on a MEGAHIT assembly of a metagenome with the eukaryotic host and many other unicellular eukaryotes, so I'm curious to see how it preformed on the prokaryotic and the eukaryotic fraction.

It seems the run finished successfully with some bins:

Get initial quality of bins.
Selected Refined_9 from Refined_AC with quality = 98.3 (comp. = 98.3%, cont. = 0.0%).
Selected Refined_7 from Refined_AC with quality = 98.3 (comp. = 98.3%, cont. = 0.0%).
Selected bin_2 from X_t_logtrans with quality = 98.3 (comp. = 98.3%, cont. = 0.0%).
Selected Refined_10 from Refined_AC with quality = 91.2 (comp. = 91.2%, cont. = 0.0%).
Selected Refined_11 from Refined_AC with quality = 86.2 (comp. = 96.6%, cont. = 3.4%).
Selected bin_4 from X_com_logtrans with quality = 82.9 (comp. = 82.9%, cont. = 0.0%).
Selected Refined_14 from Refined_AC with quality = 76.6 (comp. = 82.8%, cont. = 2.0%).
Selected bin_6 from X_t_logtrans with quality = 76.4 (comp. = 91.9%, cont. = 5.2%).
Selected Refined_13 from Refined_AC with quality = 71.0 (comp. = 82.3%, cont. = 3.8%).
Selected Refined_1 from Refined_AC with quality = 65.3 (comp. = 70.5%, cont. = 1.7%).
Selected Refined_15 from Refined_AC with quality = 63.8 (comp. = 69.0%, cont. = 1.7%).
Selected Refined_1 from Refined_AB with quality = 63.8 (comp. = 63.8%, cont. = 0.0%).
Selected bin_11 from X_com_logtrans with quality = 59.3 (comp. = 70.1%, cont. = 3.6%).
Selected bin_13 from X_t_logtrans with quality = 50.0 (comp. = 83.6%, cont. = 11.2%).
Selected bin_14 from X_com_logtrans with quality = 44.1 (comp. = 60.2%, cont. = 5.3%).
Selected bin_3 from X_cov_logtrans with quality = 41.3 (comp. = 53.5%, cont. = 4.1%).
Selected bin_15 from X_t_logtrans with quality = 33.0 (comp. = 57.3%, cont. = 8.1%).
Selected bin_17 from X_com_logtrans with quality = 26.3 (comp. = 68.2%, cont. = 14.0%).
Binning Finished!

but apart from some sklearn warnings:

partial_seed_kmeans_bacar_marker_seed_length_weight_2quarter_X_cov_logtrans_result.tsv
/.../envs/metabinner-1.4.4_env/lib/python3.7/site-packages/sklearn/utils/deprecation.py:144: FutureWarning: The sklearn.cluster.k_means_ module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.cluster. Anything that cannot be imported from sklearn.cluster is now part of the private API.
  warnings.warn(message, FutureWarning)

there was an error with test_getmarker_3quarter.pl on several spots:

2024-05-07 01:24:45,264 - markerCmd failed! Not exist: /.../envs/metabinner-1.4.4_env/bin/auxiliary/test_getmarker_3quarter.pl /.../metab_out/metabinner_res/intermediate_result/partial_seed_kmeans_bacar_marker_seed_length_weight_2quarter_X_com_logtrans_result.tsv_bins_post_process_mincomp_70_mincont_50_bins/37742_reclustered_0.fa.bacar_marker.hmmout /.../metab_out/metabinner_res/intermediate_result/partial_seed_kmeans_bacar_marker_seed_length_weight_2quarter_X_com_logtrans_result.tsv_bins_post_process_mincomp_70_mincont_50_bins/37742_reclustered_0.fa 1001 /.../metab_out/metabinner_res/intermediate_result/partial_seed_kmeans_bacar_marker_seed_length_weight_2quarter_X_com_logtrans_result.tsv_bins_post_process_mincomp_70_mincont_50_bins/37742_reclustered_0.fa.bacar_marker.3quarter.seed

2024-05-07 01:25:14,796 - markerCmd failed! Not exist: /.../envs/metabinner-1.4.4_env/bin/auxiliary/test_getmarker_3quarter.pl /.../metab_out/metabinner_res/intermediate_result/partial_seed_kmeans_bacar_marker_seed_length_weight_3quarter_X_com_logtrans_result.tsv_bins_post_process_mincomp_70_mincont_50_bins/35559_reclustered_0.fa.bacar_marker.hmmout /.../metab_out/metabinner_res/intermediate_result/partial_seed_kmeans_bacar_marker_seed_length_weight_3quarter_X_com_logtrans_result.tsv_bins_post_process_mincomp_70_mincont_50_bins/35559_reclustered_0.fa 1001 /.../metab_out/metabinner_res/intermediate_result/partial_seed_kmeans_bacar_marker_seed_length_weight_3quarter_X_com_logtrans_result.tsv_bins_post_process_mincomp_70_mincont_50_bins/35559_reclustered_0.fa.bacar_marker.3quarter.seed

there are a few more of those errors but they are all similar. Could you make sense of it? Thanks.

Another question:

Thank you!

ziyewang commented 6 months ago

Hi,

Metabinner can be used for recovering prokaryotic bins. The sklearn warnings don't affect the results; there was an error with test_getmarker_3quarter.pl possibly due to the sequences not being able to identify the corresponding bacterial or archaeal single-copy marker genes (I can further confirm this issue if you could provide me with the log files. ).
When it comes to multi-sample cross-assemblies, it's recommended to provide each sample's data as individual files (Give all as individual samples sample1_R1/R2.fastq sample2_R1/R2.fastq etc).

Best, Ziye