oschwengers / platon

Identification & characterization of bacterial plasmid-borne contigs from short-read draft assemblies.
https://doi.org/10.1099/mgen.0.000398
GNU General Public License v3.0
102 stars 15 forks source link

"Marker protein search failed!" error after execution #46

Open alex-trist opened 3 months ago

alex-trist commented 3 months ago

Describe the bug Hi, I am using platon version 1.7 installed through mamba. As input I am using draft assmblies obtained from bacterial whole genome sequencing (enterobacteria, mainly klebsiella) with Illumina. My issues are:

  1. What is the required length of the contigs? One of the assemblies has 170 contigs; however, 56 are analyzed based on their size.
  2. Using any of the assemblies, the result is always "Marker protein search failed!" The error that appears in the log file is: ERROR - MAIN - diamond execution failed! diamond-error-code=-11.

Any hint on how to fix it?

Best regards.

Therefore, please provide us with at least the following information:

"Marker protein search failed!" error after execution

platon --db ../Data_Bases/db_platon/ --prefix --output platon/ --verbose --threads 8 contigs/KP882418.fasta

BioConda (mamba)

1.7

TranNhatTan14 commented 3 months ago

Hi @alex-trist Did you find solution for this problem?

alex-trist commented 2 months ago

Hi @alex-trist Did you find solution for this problem?

Hi @TranNhatTan14 , no, not yet.

I made some of the analysis with MOB, but I would like to try Platon.

Any sugestions?

oschwengers commented 2 months ago

Hi and thanks for reaching out. Let me start with the question about contig lengths: Platon works on the detection and evaluation of so called marker proteins upon which a so called replicon distribution score is computed. As short contigs barely encode any CDS and thus no marker proteins, they cannot be used for Platons approach and thus, are skipped upfront. However, they also make up only a tiny fraction of potential plasmids, so in most cases this should be neglectable.

Regarding the Diamond error. This seems to be a Diamond related bug. Very often, this is caused by too little memory resources. Could you try to execute Platon on a machine with ~16 GB memory?

alex-trist commented 2 months ago

Thanks! @oschwengers

Indeed I executed Platon on a 16GB machine; however, I've experienced memory related issues with DIAMOND in the past, so I'll try in a more powerfull machine and get back to you.

Regards.

1073501616 commented 2 months ago

Hi @oschwengers !I have met the same problem, do you have any other solution?

Longyulin22 commented 2 months ago

Hi @oschwengers !I have met the same problem, do you have any other solution? mylog: 2024-05-08 15:48:34,121 - INFO - MAIN - version 1.7 2024-05-08 15:48:34,121 - INFO - MAIN - command line: /ifs1/User/longyulin/mambaforge-pypy3/envs/platon/bin/platon --db /ifs1/User/longyulin/mambaforge-pypy3/envs/platon/db --output 120 -v -t 24 /ifs1/User/longyulin/data/seqkit-m300-g/120.fasta 2024-05-08 15:48:34,121 - INFO - CONFIG - threads=24 2024-05-08 15:48:34,121 - INFO - CONFIG - verbose=True 2024-05-08 15:48:34,121 - DEBUG - CONFIG - test parameter db: db_tmp=/ifs1/User/longyulin/mambaforge-pypy3/envs/platon/db 2024-05-08 15:48:34,121 - INFO - CONFIG - database detected: type=parameter, path=/ifs1/User/longyulin/mambaforge-pypy3/envs/platon/db 2024-05-08 15:48:34,121 - INFO - CONFIG - genome-path=/ifs1/User/longyulin/data/seqkit-m300-g/120.fasta 2024-05-08 15:48:34,122 - INFO - CONFIG - tmp-path=/tmp/tmpzeir3tg3 2024-05-08 15:48:34,122 - INFO - CONFIG - output-path=/ifs1/User/longyulin/mambaforge-pypy3/envs/platon/120 2024-05-08 15:48:34,122 - INFO - CONFIG - mode=accuracy 2024-05-08 15:48:34,122 - INFO - CONFIG - characterize=False 2024-05-08 15:48:34,122 - INFO - CONFIG - metagenome=False 2024-05-08 15:48:34,125 - INFO - UTILS - dependency check: tool=prodigal, version=v2.6.3 2024-05-08 15:48:34,164 - INFO - UTILS - dependency check: tool=diamond, version=v2.1.9 2024-05-08 15:48:34,316 - INFO - UTILS - dependency check: tool=blastn, version=v2.15.0 2024-05-08 15:48:34,319 - INFO - UTILS - dependency check: tool=hmmsearch, version=v3.4.0 2024-05-08 15:48:34,323 - INFO - UTILS - dependency check: tool=nucmer, version=v4.0.0 2024-05-08 15:48:34,328 - INFO - UTILS - dependency check: tool=cmscan, version=v1.1.5 2024-05-08 15:48:34,345 - INFO - MAIN - exclude contig: too long: id=NODE_1_length_1550057_cov_277.070263, length=1550057 2024-05-08 15:48:34,349 - INFO - MAIN - exclude contig: too long: id=NODE_2_length_546324_cov_286.703279, length=546324 2024-05-08 15:48:34,361 - INFO - MAIN - exclude contig: too short: id=NODE_25_length_866_cov_1.737643, length=866 2024-05-08 15:48:34,361 - INFO - MAIN - exclude contig: too short: id=NODE_26_length_856_cov_1.206675, length=856 2024-05-08 15:48:34,361 - INFO - MAIN - exclude contig: too short: id=NODE_27_length_829_cov_1.444149, length=829 2024-05-08 15:48:34,361 - INFO - MAIN - exclude contig: too short: id=NODE_28_length_797_cov_914.684722, length=797 2024-05-08 15:48:34,361 - INFO - MAIN - exclude contig: too short: id=NODE_29_length_746_cov_2.539611, length=746 2024-05-08 15:48:34,361 - INFO - MAIN - exclude contig: too short: id=NODE_30_length_615_cov_0.912639, length=615 2024-05-08 15:48:34,361 - INFO - MAIN - exclude contig: too short: id=NODE_31_length_542_cov_1.208602, length=542 2024-05-08 15:48:34,361 - INFO - MAIN - exclude contig: too short: id=NODE_32_length_540_cov_1.114471, length=540 2024-05-08 15:48:34,361 - INFO - MAIN - exclude contig: too short: id=NODE_33_length_524_cov_2071.434004, length=524 2024-05-08 15:48:34,361 - INFO - MAIN - exclude contig: too short: id=NODE_34_length_516_cov_0.997722, length=516 2024-05-08 15:48:34,361 - INFO - MAIN - exclude contig: too short: id=NODE_35_length_491_cov_0.881643, length=491 2024-05-08 15:48:34,361 - INFO - MAIN - exclude contig: too short: id=NODE_36_length_488_cov_622.467153, length=488 2024-05-08 15:48:34,361 - INFO - MAIN - exclude contig: too short: id=NODE_37_length_477_cov_0.922500, length=477 2024-05-08 15:48:34,361 - INFO - MAIN - exclude contig: too short: id=NODE_38_length_475_cov_1.261307, length=475 2024-05-08 15:48:34,361 - INFO - MAIN - exclude contig: too short: id=NODE_39_length_472_cov_705.840506, length=472 2024-05-08 15:48:34,361 - INFO - MAIN - exclude contig: too short: id=NODE_40_length_467_cov_897.476923, length=467 2024-05-08 15:48:34,361 - INFO - MAIN - exclude contig: too short: id=NODE_41_length_431_cov_1.483051, length=431 2024-05-08 15:48:34,362 - INFO - MAIN - exclude contig: too short: id=NODE_42_length_412_cov_1.023881, length=412 2024-05-08 15:48:34,362 - INFO - MAIN - exclude contig: too short: id=NODE_43_length_410_cov_290.438438, length=410 2024-05-08 15:48:34,362 - INFO - MAIN - exclude contig: too short: id=NODE_44_length_402_cov_1934.320000, length=402 2024-05-08 15:48:34,362 - INFO - MAIN - exclude contig: too short: id=NODE_45_length_389_cov_302.602564, length=389 2024-05-08 15:48:34,362 - INFO - MAIN - exclude contig: too short: id=NODE_46_length_382_cov_0.865574, length=382 2024-05-08 15:48:34,362 - INFO - MAIN - exclude contig: too short: id=NODE_47_length_369_cov_0.660959, length=369 2024-05-08 15:48:34,362 - INFO - MAIN - exclude contig: too short: id=NODE_48_length_363_cov_0.583916, length=363 2024-05-08 15:48:34,362 - INFO - MAIN - exclude contig: too short: id=NODE_49_length_358_cov_0.779359, length=358 2024-05-08 15:48:34,362 - INFO - MAIN - exclude contig: too short: id=NODE_50_length_357_cov_1.042857, length=357 2024-05-08 15:48:34,362 - INFO - MAIN - exclude contig: too short: id=NODE_51_length_357_cov_1.042857, length=357 2024-05-08 15:48:34,362 - INFO - MAIN - exclude contig: too short: id=NODE_52_length_355_cov_1.435252, length=355 2024-05-08 15:48:34,362 - INFO - MAIN - exclude contig: too short: id=NODE_53_length_355_cov_1.255396, length=355 2024-05-08 15:48:34,362 - INFO - MAIN - exclude contig: too short: id=NODE_54_length_351_cov_1.065693, length=351 2024-05-08 15:48:34,362 - INFO - MAIN - exclude contig: too short: id=NODE_55_length_351_cov_0.791971, length=351 2024-05-08 15:48:34,362 - INFO - MAIN - exclude contig: too short: id=NODE_56_length_346_cov_1.085502, length=346 2024-05-08 15:48:34,362 - INFO - MAIN - exclude contig: too short: id=NODE_57_length_335_cov_0.751938, length=335 2024-05-08 15:48:34,362 - INFO - MAIN - exclude contig: too short: id=NODE_58_length_333_cov_0.968750, length=333 2024-05-08 15:48:34,362 - INFO - MAIN - exclude contig: too short: id=NODE_59_length_331_cov_0.763780, length=331 2024-05-08 15:48:34,362 - INFO - MAIN - exclude contig: too short: id=NODE_60_length_330_cov_0.837945, length=330 2024-05-08 15:48:34,363 - INFO - MAIN - exclude contig: too short: id=NODE_61_length_327_cov_0.812000, length=327 2024-05-08 15:48:34,363 - INFO - MAIN - exclude contig: too short: id=NODE_62_length_325_cov_14.423387, length=325 2024-05-08 15:48:34,363 - INFO - MAIN - exclude contig: too short: id=NODE_63_length_317_cov_1.300000, length=317 2024-05-08 15:48:34,363 - INFO - MAIN - exclude contig: too short: id=NODE_64_length_314_cov_1.232068, length=314 2024-05-08 15:48:34,363 - INFO - MAIN - exclude contig: too short: id=NODE_65_length_312_cov_1.076596, length=312 2024-05-08 15:48:34,363 - INFO - MAIN - exclude contig: too short: id=NODE_66_length_307_cov_1.386957, length=307 2024-05-08 15:48:34,363 - INFO - MAIN - exclude contig: too short: id=NODE_67_length_305_cov_204.105263, length=305 2024-05-08 15:48:34,363 - INFO - MAIN - exclude contig: too short: id=NODE_68_length_305_cov_1.811404, length=305 2024-05-08 15:48:34,363 - INFO - MAIN - exclude contig: too short: id=NODE_69_length_304_cov_1.092511, length=304 2024-05-08 15:48:34,363 - INFO - MAIN - length contig filter: # input=69, # discarded=47, # remaining=22 2024-05-08 15:48:41,802 - INFO - MAIN - ORF detection: # ORFs=2434 2024-05-08 15:48:41,802 - INFO - MAIN - ORF contig filter disabled! # passed contigs=22 2024-05-08 15:48:55,491 - ERROR - MAIN - diamond execution failed! diamond-error-code=-11 2024-05-08 15:48:55,491 - DEBUG - MAIN - diamond execution: cmd=['diamond', 'blastp', '--db', '/ifs1/User/longyulin/mambaforge-pypy3/envs/platon/db/mps.dmnd', '--query', '/tmp/tmpzeir3tg3/proteins.faa', '--out', '/tmp/tmpzeir3tg3/diamond.tsv', '--max-target-seqs', '1', '--id', '90', '--query-cover', '80', '--subject-cover', '80', '--threads', '24', '--tmpdir', '/tmp/tmpzeir3tg3'], stdout='', stderr='diamond v2.1.9.163 (C) Max Planck Society for the Advancement of Science, Benjamin Buchfink, University of Tuebingen Documentation, support and updates available at http://www.diamondsearch.org Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)

CPU threads: 24

Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1) Temporary directory: /tmp/tmpzeir3tg3

Target sequences to report alignments for: 1

Opening the database... [0.266s] Database: /ifs1/User/longyulin/mambaforge-pypy3/envs/platon/db/mps.dmnd (type: Diamond database, sequences: 4847438, letters: 1549533412) Block size = 2000000000 Opening the input file... [0.001s] Opening the output file... [0s] Loading query sequences... [0.009s] Length sorting queries... [0.002s] Masking queries... [0.008s] Building query seed set... [0.093s] Algorithm: Query-indexed Building query histograms... [0.008s] Seeking in database... [0s] Loading reference sequences... [5.432s] Length sorting reference... [2.083s] Initializing temporary storage... [0s] Building reference histograms... [1.465s] Allocating buffers... [0s] Processing query block 1, reference block 1/1, shape 1/2. Building reference seed array... [0.977s] Building query seed array... [0.009s] Computing hash join... [0.093s] Searching alignments... [0.659s] Deallocating memory... [0s] Processing query block 1, reference block 1/1, shape 2/2. Building reference seed array... [0.921s] Building query seed array... [0.007s] Computing hash join... [0.086s] Searching alignments... [0.628s] Deallocating memory... [0s] Deallocating buffers... [0.32s] Clearing query masking... [0s] Computing alignments... Loading trace points... [0.186s] Sorting trace points... [0.028s] Computing alignments... '

jpaganini commented 2 months ago

Hi,

I was running into the same issue, even when requesting 20GB of memory to run it. For me, the fix was to install diamond 2.0.6 via conda, using the following command: conda install bioconda::diamond=2.0.6 --yes.

Hope it helps.

Cheers,

1073501616 commented 2 months ago

Thank you! I used the command and occurred the wrong ERROR: Wrong diamond version installed. Please, install diamond version v2.0.14! Then I try this and it run! (platon) 23:16:19 /mnt/ $ conda install -c bioconda diamond=2.0.14

oschwengers commented 1 month ago

Hi all and thanks for reporting!

There is a known bug in Diamond v2.1.9 which is reported upstream: https://github.com/bbuchfink/diamond/issues/785

Currently, downgrading to v2.1.8 should do the trick until there is an official patch for Diamond.