mikolmogorov / Flye

De novo assembler for single molecule sequencing reads using repeat graphs
Other
750 stars 164 forks source link

Flye does not generate any output ("No disjointigs were assembled" message) #128

Open StefanoLonardi opened 5 years ago

StefanoLonardi commented 5 years ago

I have been trying to assemble a 10Mb genome with uncorrected nanopore data (3-4 chromosomes expected). We have a lot of data, is that the reason Flye fails at the end?

[2019-06-22 11:00:05] INFO: >>>STAGE: configure [2019-06-22 11:00:05] INFO: Configuring run [2019-06-22 11:00:27] INFO: Total read length: 10964270213 [2019-06-22 11:00:27] INFO: Input genome size: 10000000 [2019-06-22 11:00:27] INFO: Estimated coverage: 1096 [2019-06-22 11:00:27] WARNING: Expected read coverage is 1096, the assembly is not guaranteed to be optimal in this setting. Are you sure that the genome size was entered correctly? [2019-06-22 11:00:27] INFO: Reads N50/N90: 29675 / 9753 [2019-06-22 11:00:27] INFO: Minimum overlap set to 5000 [2019-06-22 11:00:27] INFO: Selected k-mer size: 15 [2019-06-22 11:00:27] INFO: >>>STAGE: assembly [2019-06-22 11:00:27] INFO: Assembling disjointigs [2019-06-22 11:00:27] INFO: Reading sequences [2019-06-22 11:01:01] INFO: Generating solid k-mer index [2019-06-22 11:01:17] INFO: Counting k-mers (1/2): 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2019-06-22 11:02:49] INFO: Counting k-mers (2/2): 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2019-06-22 11:08:39] INFO: Filling index table 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2019-06-22 11:13:50] INFO: Extending reads [2019-06-22 12:54:29] INFO: Overlap-based coverage: 1177 [2019-06-22 12:54:29] INFO: Median overlap divergence: 0.119637 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2019-06-23 17:20:11] INFO: Assembled 0 disjointigs [2019-06-23 17:20:23] INFO: Generating sequence [2019-06-23 17:22:11] ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct

flye --nano-raw one.fastq --out-dir flye --genome-size 10m --threads 20

326reborn commented 3 years ago

@fenderglass Sorry, what I used is the latest code from github, which released on 11 Feb 2021. The Flye 2.8.3-b1695, isn't it?

mikolmogorov commented 3 years ago

@326reborn you need the latest code from the repository, not the latest release. Here are the instructions: https://github.com/fenderglass/Flye/blob/flye/docs/INSTALL.md#local-building-without-installation

326reborn commented 3 years ago

@fenderglass It run successfully! However, there may still be some problems here. Follow the way I described above, I got 1.2 Mb assembly.fasta by '--asm-coverage 40' while 3.8 Mb assembly.fasta by '--meta'. These are much smaller than that we expected. Then, I found the 'Aligned read sequence' in the log are 0.00449972 and 0.010601 respectively flye_meta.log flye_asm.log . Is this means almost no reads ware used for assembling?

mikolmogorov commented 3 years ago

@326reborn yes, that means most reads were not assembled because there were no overlaps between them.

lucyintheskyzzz commented 2 years ago

Hi I am having the same problem with my sequence files as well:

[2021-10-29 21:08:13] INFO: Starting Flye 2.9-b1768 [2021-10-29 21:08:13] INFO: >>>STAGE: configure [2021-10-29 21:08:13] INFO: Configuring run [2021-10-29 21:08:13] INFO: Total read length: 4263 [2021-10-29 21:08:13] INFO: Reads N50/N90: 1208 / 668 [2021-10-29 21:08:13] INFO: Minimum overlap set to 1000 [2021-10-29 21:08:13] INFO: >>>STAGE: assembly [2021-10-29 21:08:13] INFO: Assembling disjointigs [2021-10-29 21:08:13] INFO: Reading sequences [2021-10-29 21:08:19] INFO: Counting k-mers: 0% 20% 50% 70% 100% [2021-10-29 21:09:10] INFO: Filling index table (1/2) 0% 20% 50% 1007010070% % % [2021-10-29 21:09:10] INFO: Filling index table (2/2) 0% 20% 50% 70% 100% [2021-10-29 21:09:10] WARNING: No overlaps found - unable to estimate parameters [2021-10-29 21:09:10] INFO: Extending reads [2021-10-29 21:09:10] WARNING: No overlaps found! [2021-10-29 21:09:10] INFO: Overlap-based coverage: 0 [2021-10-29 21:09:10] INFO: Median overlap divergence: 0 0% 100% [2021-10-29 21:09:10] INFO: Assembled 0 disjointigs [2021-10-29 21:09:10] INFO: Generating sequence [2021-10-29 21:09:10] INFO: Filtering contained disjointigs [2021-10-29 21:09:10] INFO: Contained seqs: 0 [2021-10-29 21:09:10] ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct

lucyintheskyzzz commented 2 years ago

I was able to run 12/24 samples the rest said No Disjoints assembled. What does this mean? Is it possible to change this code to make it run? flye --nano-raw barcode12.fastq --out-dir barcode12.t20.flye --meta --threads 20 [2021-10-29 21:16:10] INFO: Starting Flye 2.9-b1768 [2021-10-29 21:16:10] INFO: >>>STAGE: configure [2021-10-29 21:16:10] INFO: Configuring run [2021-10-29 21:16:10] INFO: Total read length: 4263 [2021-10-29 21:16:10] INFO: Reads N50/N90: 1208 / 668 [2021-10-29 21:16:10] INFO: Minimum overlap set to 1000 [2021-10-29 21:16:10] INFO: >>>STAGE: assembly [2021-10-29 21:16:10] INFO: Assembling disjointigs [2021-10-29 21:16:10] INFO: Reading sequences [2021-10-29 21:16:13] INFO: Counting k-mers: 0% 20% 50% 70% 100% [2021-10-29 21:17:04] INFO: Filling index table (1/2) 0% 202050% % 70% 100% [2021-10-29 21:17:04] INFO: Filling index table (2/2) 0% 20% 50% 70% 100% [2021-10-29 21:17:04] WARNING: No overlaps found - unable to estimate parameters [2021-10-29 21:17:04] INFO: Extending reads [2021-10-29 21:17:04] WARNING: No overlaps found! [2021-10-29 21:17:04] INFO: Overlap-based coverage: 0 [2021-10-29 21:17:04] INFO: Median overlap divergence: 0 0% 100% [2021-10-29 21:17:04] INFO: Assembled 0 disjointigs [2021-10-29 21:17:04] INFO: Generating sequence [2021-10-29 21:17:04] INFO: Filtering contained disjointigs [2021-10-29 21:17:04] INFO: Contained seqs: 0 [2021-10-29 21:17:04] ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct

mikolmogorov commented 2 years ago

@lucyintheskyzzz your total read length is 4kb - there is nothing to assemble.

jdakota1305 commented 2 years ago

Hi @fenderglass, I am having an issue when running Flye on a rather large dataset of PacBio raw reads. The total data size of the reads file is 7.5GB. Do you think this large file size is the likely reason the pipeline is failing to assemble? Thank you!

Command error: [2021-11-16 00:29:24] WARNING: --plasmids mode is no longer available. Command line option will be removed in the future versions [2021-11-16 00:29:24] INFO: Starting Flye 2.9-b1768 [2021-11-16 00:29:24] INFO: >>>STAGE: configure [2021-11-16 00:29:24] INFO: Configuring run [2021-11-16 00:29:42] INFO: Total read length: 3797266962 [2021-11-16 00:29:42] INFO: Reads N50/N90: 3620 / 1695 [2021-11-16 00:29:42] INFO: Minimum overlap set to 2000 [2021-11-16 00:29:42] INFO: >>>STAGE: assembly [2021-11-16 00:29:42] INFO: Assembling disjointigs [2021-11-16 00:29:42] INFO: Reading sequences [2021-11-16 00:30:03] INFO: Counting k-mers: 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2021-11-16 00:34:25] INFO: Filling index table (1/2) 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2021-11-16 00:36:33] INFO: Filling index table (2/2) 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2021-11-16 00:38:58] INFO: Extending reads [2021-11-16 00:39:40] INFO: Overlap-based coverage: 644 [2021-11-16 00:39:40] INFO: Median overlap divergence: 0.179239 0% 100% [2021-11-16 01:04:43] INFO: Assembled 0 disjointigs [2021-11-16 01:04:46] INFO: Generating sequence [2021-11-16 01:04:46] INFO: Filtering contained disjointigs [2021-11-16 01:04:46] INFO: Contained seqs: 0 [2021-11-16 01:04:47] ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct [2021-11-16 01:04:47] ERROR: Pipeline aborted

mikolmogorov commented 2 years ago

@jdakota1305 please check if the advice above about pbclip applies.

lucyintheskyzzz commented 2 years ago

Now I am getting this error with a bigger read length...

[2021-11-22 11:45:53] INFO: Starting Flye 2.9-b1768 [2021-11-22 11:45:53] INFO: >>>STAGE: configure [2021-11-22 11:45:53] INFO: Configuring run [2021-11-22 11:45:54] INFO: Total read length: 33007018 [2021-11-22 11:45:54] INFO: Reads N50/N90: 548 / 393 [2021-11-22 11:45:54] INFO: Minimum overlap set to 1000 [2021-11-22 11:45:54] INFO: >>>STAGE: assembly [2021-11-22 11:45:54] INFO: Assembling disjointigs [2021-11-22 11:45:54] INFO: Reading sequences [2021-11-22 11:45:57] INFO: Counting k-mers: 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2021-11-22 11:46:47] INFO: Filling index table (1/2) 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2021-11-22 11:46:47] INFO: Filling index table (2/2) 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2021-11-22 11:46:48] INFO: Extending reads [2021-11-22 11:46:48] INFO: Overlap-based coverage: 6 [2021-11-22 11:46:48] INFO: Median overlap divergence: 0.223043 0% 100% [2021-11-22 11:46:48] INFO: Assembled 0 disjointigs [2021-11-22 11:46:48] INFO: Generating sequence [2021-11-22 11:46:48] INFO: Filtering contained disjointigs [2021-11-22 11:46:48] INFO: Contained seqs: 0 [2021-11-22 11:46:48] ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct [2021-11-22 11:46:48] ERROR: Pipeline aborted

mikolmogorov commented 2 years ago

@lucyintheskyzzz now as you can see you reads became super-short (N50 < 1kb), not a good sign and Flye can't assemble that either. Something is likely wrong with your PacBio library.

hypothalamus01 commented 2 years ago

I used nanopore to sequence a purified tandem repeat DNA (PCR product). Trying to get the exact DNA sequence by combining lot of reads to generate a consensus sequence. I got the same error message while doing assembly. (All my reads should be highly similar.) I need the assembly result as input for Medaka to generate consensus sequence. Can anyone help me with this?

Also, is there a better way to generate consensus sequence?

[2022-01-13 16:59:42] root: INFO: Starting Flye 2.9-b1768 [2022-01-13 16:59:42] root: DEBUG: Cmd: /afs/crc.nd.edu/user/c/cwang16/.conda/envs/medaka/bin/flye --nano-hq 6K_combined.fastq --out-dir 123 --threads 8 -m 3000 [2022-01-13 16:59:42] root: DEBUG: Python version: 3.8.12 | packaged by conda-forge | (default, Oct 12 2021, 21:59:51) [GCC 9.4.0] [2022-01-13 16:59:42] root: INFO: >>>STAGE: configure [2022-01-13 16:59:42] root: INFO: Configuring run [2022-01-13 16:59:43] root: INFO: Total read length: 5365504 [2022-01-13 16:59:43] root: INFO: Reads N50/N90: 7279 / 6643 [2022-01-13 16:59:43] root: INFO: Selected minimum overlap: 3000 [2022-01-13 16:59:43] root: INFO: >>>STAGE: assembly [2022-01-13 16:59:43] root: INFO: Assembling disjointigs [2022-01-13 16:59:43] root: DEBUG: -----Begin assembly log------ [2022-01-13 16:59:43] root: DEBUG: Running: flye-modules assemble --reads /afs/crc.nd.edu/user/c/cwang16/nanopore/Super_accuracy/6K_filtered_files/6K_combined.fastq --out-asm /afs/crc.nd.edu/user/c/cwang16/nanopore/Super_accuracy/6K_filtered_files/123/00-assembly/draft_assembly.fasta --config /afs/crc.nd.edu/user/c/cwang16/.conda/envs/medaka/lib/python3.8/site-packages/flye/config/bin_cfg/asm_nano_hq.cfg --log /afs/crc.nd.edu/user/c/cwang16/nanopore/Super_accuracy/6K_filtered_files/123/flye.log --threads 8 --min-ovlp 3000 [2022-01-13 16:59:43] DEBUG: Build date: Aug 22 2021 05:04:54 [2022-01-13 16:59:43] DEBUG: Total RAM: 251 Gb [2022-01-13 16:59:43] DEBUG: Available RAM: 237 Gb [2022-01-13 16:59:43] DEBUG: Total CPUs: 24 [2022-01-13 16:59:43] DEBUG: Loading /afs/crc.nd.edu/user/c/cwang16/.conda/envs/medaka/lib/python3.8/site-packages/flye/config/bin_cfg/asm_nano_hq.cfg [2022-01-13 16:59:43] DEBUG: Loading /afs/crc.nd.edu/user/c/cwang16/.conda/envs/medaka/lib/python3.8/site-packages/flye/config/bin_cfg/asm_defaults.cfg [2022-01-13 16:59:43] DEBUG: big_genome_threshold=29000000 [2022-01-13 16:59:43] DEBUG: meta_read_filter_kmer_freq=100 [2022-01-13 16:59:43] DEBUG: chain_large_gap_penalty=2 [2022-01-13 16:59:43] DEBUG: chain_small_gap_penalty=0.5 [2022-01-13 16:59:43] DEBUG: chain_gap_jump_threshold=100 [2022-01-13 16:59:43] DEBUG: max_coverage_drop_rate=5 [2022-01-13 16:59:43] DEBUG: max_extensions_drop_rate=5 [2022-01-13 16:59:43] DEBUG: chimera_window=100 [2022-01-13 16:59:43] DEBUG: chimera_overhang=1000 [2022-01-13 16:59:43] DEBUG: min_reads_in_disjointig=4 [2022-01-13 16:59:43] DEBUG: max_inner_reads=10 [2022-01-13 16:59:43] DEBUG: max_inner_fraction=0.25 [2022-01-13 16:59:43] DEBUG: max_separation=500 [2022-01-13 16:59:43] DEBUG: unique_edge_length=50000 [2022-01-13 16:59:43] DEBUG: min_repeat_res_support=0.51 [2022-01-13 16:59:43] DEBUG: out_paths_ratio=5 [2022-01-13 16:59:43] DEBUG: graph_cov_drop_rate=5 [2022-01-13 16:59:43] DEBUG: coverage_estimate_window=100 [2022-01-13 16:59:43] DEBUG: max_bubble_length=50000 [2022-01-13 16:59:43] DEBUG: loop_coverage_rate=1.5 [2022-01-13 16:59:43] DEBUG: repeat_edge_cov_mult=1.75 [2022-01-13 16:59:43] DEBUG: weak_detach_rate=5 [2022-01-13 16:59:43] DEBUG: tip_coverage_rate=2 [2022-01-13 16:59:43] DEBUG: tip_length_rate=2 [2022-01-13 16:59:43] DEBUG: output_gfa_before_rr=0 [2022-01-13 16:59:43] DEBUG: low_cutoff_warning=0 [2022-01-13 16:59:43] DEBUG: kmer_size=17 [2022-01-13 16:59:43] DEBUG: use_minimizers=1 [2022-01-13 16:59:43] DEBUG: minimizer_window=5 [2022-01-13 16:59:43] DEBUG: reads_base_alignment=1 [2022-01-13 16:59:43] DEBUG: meta_read_top_kmer_rate=0.75 [2022-01-13 16:59:43] DEBUG: maximum_jump=1500 [2022-01-13 16:59:43] DEBUG: maximum_overhang=1500 [2022-01-13 16:59:43] DEBUG: repeat_kmer_rate=100 [2022-01-13 16:59:43] DEBUG: assemble_ovlp_divergence=0.05 [2022-01-13 16:59:43] DEBUG: assemble_divergence_relative=1 [2022-01-13 16:59:43] DEBUG: repeat_graph_ovlp_divergence=0.05 [2022-01-13 16:59:43] DEBUG: read_align_ovlp_divergence=0.10 [2022-01-13 16:59:43] DEBUG: hpc_scoring_on=1 [2022-01-13 16:59:43] DEBUG: add_unassembled_reads=0 [2022-01-13 16:59:43] DEBUG: extend_contigs_with_repeats=0 [2022-01-13 16:59:43] DEBUG: min_read_cov_cutoff=3 [2022-01-13 16:59:43] DEBUG: short_tip_length=20000 [2022-01-13 16:59:43] DEBUG: long_tip_length=100000 [2022-01-13 16:59:43] DEBUG: Running with k-mer size: 17 [2022-01-13 16:59:43] DEBUG: Running with minimum overlap 3000 [2022-01-13 16:59:43] DEBUG: Metagenome mode: N [2022-01-13 16:59:43] DEBUG: Short mode: N [2022-01-13 16:59:43] INFO: Reading sequences [2022-01-13 16:59:43] DEBUG: Building positional index [2022-01-13 16:59:43] DEBUG: Total sequence: 5365504 bp [2022-01-13 16:59:43] INFO: Building minimizer index [2022-01-13 16:59:43] INFO: Pre-calculating index storage [2022-01-13 16:59:43] DEBUG: Mean k-mer frequency: 5.65713 [2022-01-13 16:59:43] DEBUG: Repetitive k-mer frequency: 565 [2022-01-13 16:59:43] DEBUG: Filtered 495696 repetitive k-mers (0.27953) [2022-01-13 16:59:43] INFO: Filling index [2022-01-13 16:59:43] DEBUG: Sorting k-mer index [2022-01-13 16:59:43] DEBUG: Selected k-mers: 313133 [2022-01-13 16:59:43] DEBUG: K-mer index size: 1277622 [2022-01-13 16:59:43] DEBUG: Mean k-mer frequency: 4.08013 [2022-01-13 16:59:43] DEBUG: Minimizer rate: 4.1996 [2022-01-13 16:59:43] DEBUG: Peak RAM usage: 0 Gb [2022-01-13 16:59:43] DEBUG: Estimating k-mer identity bias [2022-01-13 17:01:01] DEBUG: Initial divergence estimate : 0.0815804 [2022-01-13 17:01:01] DEBUG: Relative threshold: Y [2022-01-13 17:01:01] DEBUG: Max divergence threshold set to 0.13158 [2022-01-13 17:01:01] INFO: Extending reads [2022-01-13 17:01:01] DEBUG: Estimating overlap coverage [2022-01-13 17:08:46] INFO: Overlap-based coverage: 723 [2022-01-13 17:08:46] INFO: Median overlap divergence: 0.0815804 [2022-01-13 17:08:46] DEBUG: Sequence divergence distribution:

|           * * *  *       |                                                                         
|           ***** **       |                                                                         
|           ********       |                                                                         
|           *********      |                                                                         
|           *********      |                                                                         
|           *********      |                                                                         
|          **********      |                                                                         
|          ***********     |                                                                         
|          ***********     |                                                                         
|         ************     |                                                                         
|         ***************  |                                                                         
|         ***************  |                                                                         
|         ***************  |                                                                         
|        ***************** |                                                                         
|        ******************|                                                                         
|        ******************|                                                                         
|       ********************                                                                         
|       ***********************                                                                      
|       ***********************                                                                      
|  *   ************************** *                                                                  
----------------------------------------------------------------------------------------------------
0%        5%        10%       15%       20%       25%       30%       35%       40%       45%       

Q25 = 0.063, Q50 = 0.082, Q75 = 0.1

[2022-01-13 17:08:59] INFO: Assembled 0 disjointigs [2022-01-13 17:09:00] INFO: Generating sequence [2022-01-13 17:09:00] DEBUG: Building positional index [2022-01-13 17:09:00] DEBUG: Mean k-mer frequency: 0 [2022-01-13 17:09:00] DEBUG: Repetitive k-mer frequency: 0 [2022-01-13 17:09:00] DEBUG: Filtered 0 repetitive k-mers (-nan) [2022-01-13 17:09:00] DEBUG: Sorting k-mer index [2022-01-13 17:09:00] DEBUG: Selected k-mers: 0 [2022-01-13 17:09:00] DEBUG: K-mer index size: 0 [2022-01-13 17:09:00] DEBUG: Mean k-mer frequency: -nan [2022-01-13 17:09:00] DEBUG: Minimizer rate: -nan [2022-01-13 17:09:00] INFO: Filtering contained disjointigs [2022-01-13 17:09:00] DEBUG: Computing transitive closure for overlaps [2022-01-13 17:09:00] DEBUG: Found 0 overlaps [2022-01-13 17:09:00] DEBUG: Left 0 overlaps after filtering [2022-01-13 17:09:00] INFO: Contained seqs: 0 [2022-01-13 17:09:00] DEBUG: Writing FASTA [2022-01-13 17:09:00] DEBUG: Peak RAM usage: 0 Gb -----------End assembly log------------ [2022-01-13 17:09:00] root: ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct [2022-01-13 17:09:00] root: ERROR: Pipeline aborted

mikolmogorov commented 2 years ago

@hypothalamus01 could you try running using --meta switch?

hypothalamus01 commented 2 years ago

@fenderglass It work!! Thank you so much for your advice.

Gahyeon-K commented 2 years ago

I have tried to assemble about 8-10 Mb genome with nanopore data. This is not metagenomic data. I just did sequencing one sample for the test. ~$ flye --nano-hq Gahyeon/NIBR_87/01_Porechop/NIBR_87_trimmed.fastq.gz --out-dir Gahyeon/NIBR_87/02_Flye/ --threads 32 I used this code for assembly. Before, I succeed Flye assembly using this code. However, although both are the same genus, I failed assembly with 'No disjointigs were assembled' error message. I think sequencing output is enough for analysis because the total read length is 2,453,898,263 and N50 is 32 kb. So, I tried to change the parameter. I added --min-overlap 1000 (because the default setting value was 6000), but I got the same message. Second, I added --asm-coverage 50, but I got about 2.4 Mb genome assembly fasta file (1 fragment), which is smaller than expected. Third, I added --meta, but I got about 2.4 Mb genome assembly fasta file (2 fragments). Fourth, parameters: --threads 32 --genome-size 9m --read -error 0.03 --min-overlap 8000 --asm-coverage 40. Result: 1 fragment, 2,412,861 bp. How can I troubleshoot this error? I need your recommendation. Thank you! This is my log file. [2022-02-07 15:52:31] root: INFO: Starting Flye 2.9-b1774 [2022-02-07 15:52:31] root: DEBUG: Cmd: /Sysbio/SDD/kjh/miniconda3/bin/flye --nano-hq Gahyeon/NIBR_87/01_Porechop/NIBR_87_trimmed.fastq.gz --out-dir Gahyeon/NIBR_87/02_Flye/ --threads 32 [2022-02-07 15:52:31] root: DEBUG: Python version: 3.7.10 (default, Jun 4 2021, 14:48:32) [GCC 7.5.0] [2022-02-07 15:52:31] root: INFO: >>>STAGE: configure [2022-02-07 15:52:31] root: INFO: Configuring run [2022-02-07 15:53:04] root: INFO: Starting Flye 2.9-b1774 [2022-02-07 15:53:04] root: DEBUG: Cmd: /Sysbio/SDD/kjh/miniconda3/bin/flye --nano-hq Gahyeon/NIBR_87/01_Porechop/NIBR_87_trimmed.fastq.gz --out-dir Gahyeon/NIBR_87/02_Flye/ --threads 32 [2022-02-07 15:53:04] root: DEBUG: Python version: 3.7.10 (default, Jun 4 2021, 14:48:32) [GCC 7.5.0] [2022-02-07 15:53:04] root: INFO: >>>STAGE: configure [2022-02-07 15:53:04] root: INFO: Configuring run [2022-02-07 15:53:53] root: INFO: Total read length: 2453898263 [2022-02-07 15:53:53] root: INFO: Reads N50/N90: 32404 / 6187 [2022-02-07 15:53:53] root: INFO: Minimum overlap set to 6000 [2022-02-07 15:53:53] root: INFO: >>>STAGE: assembly [2022-02-07 15:53:53] root: INFO: Assembling disjointigs [2022-02-07 15:53:53] root: DEBUG: -----Begin assembly log------ [2022-02-07 15:53:53] root: DEBUG: Running: flye-modules assemble --reads /Sysbio/SDD/kjh/Gahyeon/NIBR_87/01_Porechop/NIBR_87_trimmed.fastq.gz --out-asm /Sysbio/SDD/kjh/Gahyeon/NIBR_87/02_Flye/00-assembly/draft_assembly.fasta --config /Sysbio/SDD/kjh/miniconda3/lib/python3.7/site-packages/flye/config/bin_cfg/asm_nano_hq.cfg --log /Sysbio/SDD/kjh/Gahyeon/NIBR_87/02_Flye/flye.log --threads 32 --min-ovlp 6000 [2022-02-07 15:53:53] DEBUG: Build date: Dec 4 2021 13:25:53 [2022-02-07 15:53:53] DEBUG: Total RAM: 251 Gb [2022-02-07 15:53:53] DEBUG: Available RAM: 246 Gb [2022-02-07 15:53:53] DEBUG: Total CPUs: 32 [2022-02-07 15:53:53] DEBUG: Loading /Sysbio/SDD/kjh/miniconda3/lib/python3.7/site-packages/flye/config/bin_cfg/asm_nano_hq.cfg [2022-02-07 15:53:53] DEBUG: Loading /Sysbio/SDD/kjh/miniconda3/lib/python3.7/site-packages/flye/config/bin_cfg/asm_defaults.cfg [2022-02-07 15:53:53] DEBUG: big_genome_threshold=29000000 [2022-02-07 15:53:53] DEBUG: meta_read_filter_kmer_freq=100 [2022-02-07 15:53:53] DEBUG: chain_large_gap_penalty=2 [2022-02-07 15:53:53] DEBUG: chain_small_gap_penalty=0.5 [2022-02-07 15:53:53] DEBUG: chain_gap_jump_threshold=100 [2022-02-07 15:53:53] DEBUG: max_coverage_drop_rate=5 [2022-02-07 15:53:53] DEBUG: max_extensions_drop_rate=5 [2022-02-07 15:53:53] DEBUG: chimera_window=100 [2022-02-07 15:53:53] DEBUG: chimera_overhang=1000 [2022-02-07 15:53:53] DEBUG: min_reads_in_disjointig=4 [2022-02-07 15:53:53] DEBUG: max_inner_reads=10 [2022-02-07 15:53:53] DEBUG: max_inner_fraction=0.25 [2022-02-07 15:53:53] DEBUG: max_separation=500 [2022-02-07 15:53:53] DEBUG: unique_edge_length=50000 [2022-02-07 15:53:53] DEBUG: min_repeat_res_support=0.51 [2022-02-07 15:53:53] DEBUG: out_paths_ratio=5 [2022-02-07 15:53:53] DEBUG: graph_cov_drop_rate=5 [2022-02-07 15:53:53] DEBUG: coverage_estimate_window=100 [2022-02-07 15:53:53] DEBUG: max_bubble_length=50000 [2022-02-07 15:53:53] DEBUG: loop_coverage_rate=1.5 [2022-02-07 15:53:53] DEBUG: repeat_edge_cov_mult=1.75 [2022-02-07 15:53:53] DEBUG: weak_detach_rate=5 [2022-02-07 15:53:53] DEBUG: tip_coverage_rate=2 [2022-02-07 15:53:53] DEBUG: tip_length_rate=2 [2022-02-07 15:53:53] DEBUG: output_gfa_before_rr=0 [2022-02-07 15:53:53] DEBUG: low_cutoff_warning=0 [2022-02-07 15:53:53] DEBUG: kmer_size=17 [2022-02-07 15:53:53] DEBUG: use_minimizers=1 [2022-02-07 15:53:53] DEBUG: minimizer_window=5 [2022-02-07 15:53:53] DEBUG: reads_base_alignment=1 [2022-02-07 15:53:53] DEBUG: meta_read_top_kmer_rate=0.75 [2022-02-07 15:53:53] DEBUG: maximum_jump=1500 [2022-02-07 15:53:53] DEBUG: maximum_overhang=1500 [2022-02-07 15:53:53] DEBUG: repeat_kmer_rate=100 [2022-02-07 15:53:53] DEBUG: assemble_ovlp_divergence=0.05 [2022-02-07 15:53:53] DEBUG: assemble_divergence_relative=1 [2022-02-07 15:53:53] DEBUG: repeat_graph_ovlp_divergence=0.05 [2022-02-07 15:53:53] DEBUG: read_align_ovlp_divergence=0.10 [2022-02-07 15:53:53] DEBUG: hpc_scoring_on=1 [2022-02-07 15:53:53] DEBUG: add_unassembled_reads=0 [2022-02-07 15:53:53] DEBUG: extend_contigs_with_repeats=0 [2022-02-07 15:53:53] DEBUG: min_read_cov_cutoff=3 [2022-02-07 15:53:53] DEBUG: short_tip_length=20000 [2022-02-07 15:53:53] DEBUG: long_tip_length=100000 [2022-02-07 15:53:53] DEBUG: Running with k-mer size: 17 [2022-02-07 15:53:53] DEBUG: Running with minimum overlap 6000 [2022-02-07 15:53:53] DEBUG: Metagenome mode: N [2022-02-07 15:53:53] DEBUG: Short mode: N [2022-02-07 15:53:53] INFO: Reading sequences [2022-02-07 15:54:31] DEBUG: Building positional index [2022-02-07 15:54:31] DEBUG: Total sequence: 2216474985 bp [2022-02-07 15:54:31] INFO: Building minimizer index [2022-02-07 15:54:31] INFO: Pre-calculating index storage [2022-02-07 15:55:10] DEBUG: Mean k-mer frequency: 4.87955 [2022-02-07 15:55:10] DEBUG: Repetitive k-mer frequency: 487 [2022-02-07 15:55:10] DEBUG: Filtered 104452336 repetitive k-mers (0.141455) [2022-02-07 15:55:14] INFO: Filling index [2022-02-07 15:55:43] DEBUG: Sorting k-mer index [2022-02-07 15:55:56] DEBUG: Selected k-mers: 151150421 [2022-02-07 15:55:56] DEBUG: K-mer index size: 633963781 [2022-02-07 15:55:56] DEBUG: Mean k-mer frequency: 4.19426 [2022-02-07 15:55:56] DEBUG: Minimizer rate: 3.49622 [2022-02-07 15:55:56] DEBUG: Peak RAM usage: 10 Gb [2022-02-07 15:55:56] DEBUG: Estimating k-mer identity bias [2022-02-07 15:59:41] DEBUG: Initial divergence estimate : 0.115184 [2022-02-07 15:59:41] DEBUG: Relative threshold: Y [2022-02-07 15:59:41] DEBUG: Max divergence threshold set to 0.165184 [2022-02-07 15:59:41] INFO: Extending reads [2022-02-07 15:59:41] DEBUG: Estimating overlap coverage [2022-02-07 17:02:20] INFO: Overlap-based coverage: 796 [2022-02-07 17:02:20] INFO: Median overlap divergence: 0.1158 [2022-02-07 17:02:20] DEBUG: Sequence divergence distribution:

|                    * *          |                                                                  
|                    ***          |                                                                  
|                    ****         |                                                                  
|                    ****         |                                                                  
|                   ******        |                                                                  
|                   ******        |                                                                  
|                  *******        |                                                                  
|                  ********       |                                                                  
|                  *********      |                                                                  
|                  *********      |                                                                  
|                  **********     |                                                                  
|                  **********     |                                                                  
|                 ************    |                                                                  
|                 ************    |                                                                  
|                 *************   |                                                                  
|                 *************** |                                                                  
|                ******************                                                                  
|                ********************                                                                
|               ***********************                                                              
|             ******************************** *   *                                     *           
----------------------------------------------------------------------------------------------------
0%        5%        10%       15%       20%       25%       30%       35%       40%       45%       

Q25 = 0.1, Q50 = 0.12, Q75 = 0.13

[2022-02-07 17:32:26] INFO: Assembled 0 disjointigs [2022-02-07 17:32:33] INFO: Generating sequence [2022-02-07 17:32:33] DEBUG: Building positional index [2022-02-07 17:32:33] DEBUG: Mean k-mer frequency: 0 [2022-02-07 17:32:33] DEBUG: Repetitive k-mer frequency: 0 [2022-02-07 17:32:33] DEBUG: Filtered 0 repetitive k-mers (-nan) [2022-02-07 17:32:33] DEBUG: Sorting k-mer index [2022-02-07 17:32:33] DEBUG: Selected k-mers: 0 [2022-02-07 17:32:33] DEBUG: K-mer index size: 0 [2022-02-07 17:32:33] DEBUG: Mean k-mer frequency: -nan [2022-02-07 17:32:33] DEBUG: Minimizer rate: -nan [2022-02-07 17:32:33] INFO: Filtering contained disjointigs [2022-02-07 17:32:33] DEBUG: Computing transitive closure for overlaps [2022-02-07 17:32:33] DEBUG: Found 0 overlaps [2022-02-07 17:32:33] DEBUG: Left 0 overlaps after filtering [2022-02-07 17:32:33] INFO: Contained seqs: 0 [2022-02-07 17:32:33] DEBUG: Writing FASTA [2022-02-07 17:32:33] DEBUG: Peak RAM usage: 12 Gb -----------End assembly log------------ [2022-02-07 17:32:34] root: ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct [2022-02-07 17:32:34] root: ERROR: Pipeline aborted

mikolmogorov commented 2 years ago

@Gahyeon-K I see that read error rate is ~12%, which is higher than the expected ~5% for --nano-hq. I would try assembling with --nano-raw and --meta. if that does not work, please attach full log of the new run. And could you also provide more info about the dataset (genome / sequencing specifics).

kakuk9 commented 2 years ago

I have a similar issue - "ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct" with my nanopore reads (Guppy v6, super high accurate basecalling). I tried both "--nano-raw" and "--nano-hq". Below is the log. It's not metagenomic data, they are amplicon (~1k long), viral reads, circular genome. I tried "--meta" and "--asm-coverage 50" respectively, it didn't solve the error. Is my data not suitable to analyze with Flye? If possible, would really appreaciate some advise, thanks!

[2022-06-16 14:28:25] root: INFO: Starting Flye 2.9-b1778 [2022-06-16 14:28:25] root: DEBUG: Cmd: path/Flye/bin/flye --nano-hq input.fastq.gz --genome-size 3200 --asm-coverage 50 --out-dir Sample_barcode_08_flye_asm50 --threads 16 [2022-06-16 14:28:25] root: DEBUG: Python version: 3.8.8 (default, Apr 13 2021, 19:58:26) [GCC 7.3.0] [2022-06-16 14:28:25] root: INFO: >>>STAGE: configure [2022-06-16 14:28:25] root: INFO: Configuring run [2022-06-16 14:28:31] root: INFO: Total read length: 260757161 [2022-06-16 14:28:31] root: INFO: Input genome size: 3200 [2022-06-16 14:28:31] root: INFO: Estimated coverage: 81486 [2022-06-16 14:28:31] root: WARNING: Expected read coverage is 81486, the assembly is not guaranteed to be optimal in this setting. Are you sure that the genome size was entered correctly? [2022-06-16 14:28:31] root: INFO: Reads N50/N90: 1065 / 1015 [2022-06-16 14:28:31] root: INFO: Minimum overlap set to 1000 [2022-06-16 14:28:31] root: INFO: Using longest 50x reads for contig assembly [2022-06-16 14:28:31] root: DEBUG: Min read length cutoff: 1178 [2022-06-16 14:28:31] root: INFO: >>>STAGE: assembly [2022-06-16 14:28:31] root: INFO: Assembling disjointigs [2022-06-16 14:28:31] root: DEBUG: -----Begin assembly log------ [2022-06-16 14:28:31] root: DEBUG: Running: flye-modules assemble --reads Input.fastq.gz --out-asm /output/00-assembly/draft_assembly.fasta --config path/Flye/flye/config/bin_cfg/asm_nano_hq.cfg --log /output/flye.log --threads 16 --genome-size 3200 --min-ovlp 1000 --min-read 1178 [2022-06-16 14:28:31] DEBUG: Build date: Jun 15 2022 16:49:06 [2022-06-16 14:28:31] DEBUG: Total RAM: 4031 Gb [2022-06-16 14:28:31] DEBUG: Available RAM: 3353 Gb [2022-06-16 14:28:31] DEBUG: Total CPUs: 224 [2022-06-16 14:28:31] DEBUG: Loading path/Flye/flye/config/bin_cfg/asm_nano_hq.cfg [2022-06-16 14:28:31] DEBUG: Loading path/Flye/flye/config/bin_cfg/asm_defaults.cfg [2022-06-16 14:28:31] DEBUG: big_genome_threshold=29000000 [2022-06-16 14:28:31] DEBUG: meta_read_filter_kmer_freq=100 [2022-06-16 14:28:31] DEBUG: chain_large_gap_penalty=2 [2022-06-16 14:28:31] DEBUG: chain_small_gap_penalty=0.5 [2022-06-16 14:28:31] DEBUG: chain_gap_jump_threshold=100 [2022-06-16 14:28:31] DEBUG: max_coverage_drop_rate=5 [2022-06-16 14:28:31] DEBUG: max_extensions_drop_rate=5 [2022-06-16 14:28:31] DEBUG: chimera_window=100 [2022-06-16 14:28:31] DEBUG: chimera_overhang=1000 [2022-06-16 14:28:31] DEBUG: min_reads_in_disjointig=4 [2022-06-16 14:28:31] DEBUG: max_inner_reads=10 [2022-06-16 14:28:31] DEBUG: max_inner_fraction=0.25 [2022-06-16 14:28:31] DEBUG: max_separation=500 [2022-06-16 14:28:31] DEBUG: unique_edge_length=50000 [2022-06-16 14:28:31] DEBUG: min_repeat_res_support=0.51 [2022-06-16 14:28:31] DEBUG: out_paths_ratio=5 [2022-06-16 14:28:31] DEBUG: graph_cov_drop_rate=5 [2022-06-16 14:28:31] DEBUG: coverage_estimate_window=100 [2022-06-16 14:28:31] DEBUG: max_bubble_length=50000 [2022-06-16 14:28:31] DEBUG: loop_coverage_rate=1.5 [2022-06-16 14:28:31] DEBUG: repeat_edge_cov_mult=1.75 [2022-06-16 14:28:31] DEBUG: weak_detach_rate=5 [2022-06-16 14:28:31] DEBUG: tip_coverage_rate=2 [2022-06-16 14:28:31] DEBUG: tip_length_rate=2 [2022-06-16 14:28:31] DEBUG: output_gfa_before_rr=0 [2022-06-16 14:28:31] DEBUG: remove_alt_edges=0 [2022-06-16 14:28:31] DEBUG: low_cutoff_warning=0 [2022-06-16 14:28:31] DEBUG: kmer_size=17 [2022-06-16 14:28:31] DEBUG: use_minimizers=1 [2022-06-16 14:28:31] DEBUG: minimizer_window=5 [2022-06-16 14:28:31] DEBUG: reads_base_alignment=1 [2022-06-16 14:28:31] DEBUG: meta_read_top_kmer_rate=0.75 [2022-06-16 14:28:31] DEBUG: maximum_jump=1500 [2022-06-16 14:28:31] DEBUG: maximum_overhang=1500 [2022-06-16 14:28:31] DEBUG: repeat_kmer_rate=100 [2022-06-16 14:28:31] DEBUG: assemble_ovlp_divergence=0.05 [2022-06-16 14:28:31] DEBUG: assemble_divergence_relative=1 [2022-06-16 14:28:31] DEBUG: repeat_graph_ovlp_divergence=0.05 [2022-06-16 14:28:31] DEBUG: read_align_ovlp_divergence=0.10 [2022-06-16 14:28:31] DEBUG: hpc_scoring_on=1 [2022-06-16 14:28:31] DEBUG: add_unassembled_reads=0 [2022-06-16 14:28:31] DEBUG: extend_contigs_with_repeats=0 [2022-06-16 14:28:31] DEBUG: min_read_cov_cutoff=3 [2022-06-16 14:28:31] DEBUG: short_tip_length=20000 [2022-06-16 14:28:31] DEBUG: long_tip_length=100000 [2022-06-16 14:28:31] DEBUG: Running with k-mer size: 17 [2022-06-16 14:28:31] DEBUG: Running with minimum overlap 1000 [2022-06-16 14:28:31] DEBUG: Metagenome mode: N [2022-06-16 14:28:31] DEBUG: Short mode: N [2022-06-16 14:28:31] INFO: Reading sequences [2022-06-16 14:28:35] DEBUG: Building positional index [2022-06-16 14:28:35] DEBUG: Total sequence: 151680 bp [2022-06-16 14:28:35] INFO: Building minimizer index [2022-06-16 14:28:35] INFO: Pre-calculating index storage [2022-06-16 14:28:35] DEBUG: Mean k-mer frequency: 2.61632 [2022-06-16 14:28:35] DEBUG: Repetitive k-mer frequency: 261 [2022-06-16 14:28:35] DEBUG: Filtered 502 repetitive k-mers (0.0100557) [2022-06-16 14:28:35] INFO: Filling index [2022-06-16 14:28:35] DEBUG: Sorting k-mer index [2022-06-16 14:28:35] DEBUG: Selected k-mers: 19079 [2022-06-16 14:28:35] DEBUG: K-mer index size: 49420 [2022-06-16 14:28:35] DEBUG: Mean k-mer frequency: 2.59028 [2022-06-16 14:28:35] DEBUG: Minimizer rate: 3.0692 [2022-06-16 14:28:35] DEBUG: Peak RAM usage: 0 Gb [2022-06-16 14:28:35] DEBUG: Estimating k-mer identity bias [2022-06-16 14:28:35] DEBUG: Initial divergence estimate : 0.104459 [2022-06-16 14:28:35] DEBUG: Relative threshold: Y [2022-06-16 14:28:35] DEBUG: Max divergence threshold set to 0.154459 [2022-06-16 14:28:35] INFO: Extending reads [2022-06-16 14:28:35] DEBUG: Estimating overlap coverage [2022-06-16 14:28:36] INFO: Overlap-based coverage: 21 [2022-06-16 14:28:36] INFO: Median overlap divergence: 0.105263 [2022-06-16 14:28:36] DEBUG: Sequence divergence distribution:

|                    **        |                                                                     
|                    **        |                                                                     
|                    **        |                                                                     
|               *    ** *      |                                                                     
|               *    ** *      |                                                                     
|               *  * ** *      |                                                                     
|               *  * ** *      |                                                                     
|              **  * ** *      |                                                                     
|              **  * ** *      |                                                                     
|              **  * ****      |                                                                     
|              *** ******      |                                                                     
|              *** ******      |                                                                     
|           * **************   |  *   *                                                              
|           * **************   |  *   *                                                              
|           * **************   | ***  *                                                              
|           * **************   | ***  *                                                              
|           ****************   * ***  *                                                              
|           ****************   * ***  *                                                              
|       *  ***************** ******** ***   *  *             *                                       
|       *  ***************** ******** ***   *  *             *                                       
----------------------------------------------------------------------------------------------------
0%        5%        10%       15%       20%       25%       30%       35%       40%       45%       

Q25 = 0.082, Q50 = 0.11, Q75 = 0.13

[2022-06-16 14:28:36] INFO: Assembled 0 disjointigs
[2022-06-16 14:28:36] INFO: Generating sequence
[2022-06-16 14:28:36] DEBUG: Building positional index
[2022-06-16 14:28:36] DEBUG: Mean k-mer frequency: 0
[2022-06-16 14:28:36] DEBUG: Repetitive k-mer frequency: 0
[2022-06-16 14:28:36] DEBUG: Filtered 0 repetitive k-mers (-nan)
[2022-06-16 14:28:36] DEBUG: Sorting k-mer index
[2022-06-16 14:28:36] DEBUG: Selected k-mers: 0
[2022-06-16 14:28:36] DEBUG: K-mer index size: 0
[2022-06-16 14:28:36] DEBUG: Mean k-mer frequency: -nan
[2022-06-16 14:28:36] DEBUG: Minimizer rate: -nan
[2022-06-16 14:28:36] INFO: Filtering contained disjointigs
[2022-06-16 14:28:36] DEBUG: Computing transitive closure for overlaps
    [2022-06-16 14:28:36] DEBUG: Found 0 overlaps
    [2022-06-16 14:28:36] DEBUG: Left 0 overlaps after filtering
    [2022-06-16 14:28:36] INFO: Contained seqs: 0
    [2022-06-16 14:28:36] DEBUG: Writing FASTA
    [2022-06-16 14:28:36] DEBUG: Peak RAM usage: 0 Gb
    -----------End assembly log------------
    [2022-06-16 14:28:36] root: ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct
        [2022-06-16 14:28:36] root: ERROR: Pipeline aborted
mikolmogorov commented 2 years ago

@kakuk9 100bp amplicon is on the borderline of the minimum overlap that Flye can work with. This is likely creating an issue with Flye - it was not designed for amplicon assembly.

If amplicon is only 1kb, you likely don't need to assemble at all, but just need to polish.

Mikhail

kakuk9 commented 2 years ago

@fenderglass Thanks for your advice. I sort out figured out maybe there was an issue wth the overlapping length setting when I tried to manually lower the overlapping requirement. I will try to do some polishing., cheers!

Michaelijesse commented 1 year ago

@StefanoLonardi I hope the issue is solved. Sorry for the late reply. Kindly perform filtering longs reads using filtlong tool

filtlong --min_length 1000 --keep_percent 90 --target_bases 500000000 input.fastq.gz | qzip > filtered.fastq.gz

and then run flye assembler with your options.

rajithadp commented 1 year ago

Hi,

I have been trying to assemble a 3.9 to 4.6 Mb genome with flye but failed. I tried the following options as previously suggested but it hasn't worked for me.

  1. --meta
  2. -g 4.6m --asm-coverage 50

herewith I attached the log file. please help.

` [2022-08-29 23:18:48] root: INFO: >>>STAGE: configure [2022-08-29 23:18:48] root: INFO: Configuring run [2022-08-29 23:18:59] root: INFO: Total read length: 359948208 [2022-08-29 23:18:59] root: INFO: Reads N50/N90: 4805 / 923 [2022-08-29 23:18:59] root: INFO: Minimum overlap set to 1000 [2022-08-29 23:18:59] root: INFO: >>>STAGE: assembly [2022-08-29 23:18:59] root: INFO: Assembling disjointigs [2022-08-29 23:18:59] root: DEBUG: -----Begin assembly log------ [2022-08-29 23:18:59] DEBUG: Build date: Sep 4 2020 00:43:21 [2022-08-29 23:18:59] DEBUG: Total RAM: 15 Gb [2022-08-29 23:18:59] DEBUG: Available RAM: 15 Gb [2022-08-29 23:18:59] DEBUG: Total CPUs: 16 [2022-08-29 23:18:59] DEBUG: Loading /home/aicbu/miniconda3/envs/flye/lib/python3.7/site-packages/flye/config/bin_cfg/asm_raw_reads.cfg [2022-08-29 23:18:59] DEBUG: Loading /home/aicbu/miniconda3/envs/flye/lib/python3.7/site-packages/flye/config/bin_cfg/asm_defaults.cfg [2022-08-29 23:18:59] DEBUG: big_genome_threshold=29000000 [2022-08-29 23:18:59] DEBUG: meta_read_filter_kmer_freq=100 [2022-08-29 23:18:59] DEBUG: max_coverage_drop_rate=5 [2022-08-29 23:18:59] DEBUG: max_extensions_drop_rate=5 [2022-08-29 23:18:59] DEBUG: chimera_window=100 [2022-08-29 23:18:59] DEBUG: min_reads_in_disjointig=4 [2022-08-29 23:18:59] DEBUG: max_inner_reads=10 [2022-08-29 23:18:59] DEBUG: max_inner_fraction=0.25 [2022-08-29 23:18:59] DEBUG: max_separation=500 [2022-08-29 23:18:59] DEBUG: unique_edge_length=50000 [2022-08-29 23:18:59] DEBUG: min_repeat_res_support=0.51 [2022-08-29 23:18:59] DEBUG: out_paths_ratio=5 [2022-08-29 23:18:59] DEBUG: graph_cov_drop_rate=5 [2022-08-29 23:18:59] DEBUG: coverage_estimate_window=100 [2022-08-29 23:18:59] DEBUG: max_bubble_length=50000 [2022-08-29 23:18:59] DEBUG: loop_coverage_rate=1.5 [2022-08-29 23:18:59] DEBUG: repeat_edge_cov_mult=1.75 [2022-08-29 23:18:59] DEBUG: weak_detach_rate=5 [2022-08-29 23:18:59] DEBUG: tip_coverage_rate=2 [2022-08-29 23:18:59] DEBUG: tip_length_rate=2 [2022-08-29 23:18:59] DEBUG: low_cutoff_warning=1 [2022-08-29 23:18:59] DEBUG: hard_min_coverage_rate=10 [2022-08-29 23:18:59] DEBUG: kmer_size=17 [2022-08-29 23:18:59] DEBUG: use_minimizers=0 [2022-08-29 23:18:59] DEBUG: reads_base_alignment=0 [2022-08-29 23:18:59] DEBUG: assemble_kmer_sample=1 [2022-08-29 23:18:59] DEBUG: repeat_graph_kmer_sample=1 [2022-08-29 23:18:59] DEBUG: read_align_kmer_sample=1 [2022-08-29 23:18:59] DEBUG: meta_read_top_kmer_rate=0.40 [2022-08-29 23:18:59] DEBUG: maximum_jump=1500 [2022-08-29 23:18:59] DEBUG: maximum_overhang=1500 [2022-08-29 23:18:59] DEBUG: repeat_kmer_rate=100 [2022-08-29 23:18:59] DEBUG: assemble_ovlp_divergence=0.10 [2022-08-29 23:18:59] DEBUG: assemble_divergence_relative=1 [2022-08-29 23:18:59] DEBUG: repeat_graph_ovlp_divergence=0.10 [2022-08-29 23:18:59] DEBUG: read_align_ovlp_divergence=0.25 [2022-08-29 23:18:59] DEBUG: hpc_scoring_on=0 [2022-08-29 23:18:59] DEBUG: add_unassembled_reads=0 [2022-08-29 23:18:59] DEBUG: extend_contigs_with_repeats=0 [2022-08-29 23:18:59] DEBUG: min_read_cov_cutoff=3 [2022-08-29 23:18:59] DEBUG: short_tip_length=20000 [2022-08-29 23:18:59] DEBUG: long_tip_length=100000 [2022-08-29 23:18:59] DEBUG: Running with k-mer size: 17 [2022-08-29 23:18:59] DEBUG: Running with minimum overlap 1000 [2022-08-29 23:18:59] DEBUG: Metagenome mode: N [2022-08-29 23:18:59] INFO: Reading sequences [2022-08-29 23:19:05] DEBUG: Building positional index [2022-08-29 23:19:05] DEBUG: Total sequence: 320096457 bp [2022-08-29 23:19:07] INFO: Counting k-mers: [2022-08-29 23:19:34] DEBUG: Updating k-mer histogram [2022-08-29 23:20:05] DEBUG: Hash size: 438273 [2022-08-29 23:20:05] DEBUG: Total k-mers 239466193 [2022-08-29 23:20:05] INFO: Filling index table (1/2) [2022-08-29 23:21:38] DEBUG: Mean k-mer frequency: 4.45881 [2022-08-29 23:21:38] DEBUG: Repetitive k-mer frequency: 445 [2022-08-29 23:21:38] DEBUG: Filtered 20595727 repetitive k-mers (0.225045) [2022-08-29 23:21:39] INFO: Filling index table (2/2) [2022-08-29 23:23:11] DEBUG: Sorting k-mer index [2022-08-29 23:23:11] DEBUG: Selected k-mers: 22439831 [2022-08-29 23:23:11] DEBUG: Index size: 72752270 [2022-08-29 23:23:11] DEBUG: Mean k-mer index frequency: 3.2421 [2022-08-29 23:23:11] DEBUG: Peak RAM usage: 9 Gb [2022-08-29 23:23:11] DEBUG: Estimating k-mer identity bias [2022-08-29 23:23:17] DEBUG: Initial divergence estimate : 0.1485 [2022-08-29 23:23:17] DEBUG: Relative threshold: Y [2022-08-29 23:23:17] DEBUG: Max divergence threshold set to 0.2485 [2022-08-29 23:23:17] INFO: Extending reads [2022-08-29 23:23:17] DEBUG: Estimating overlap coverage [2022-08-29 23:23:22] INFO: Overlap-based coverage: 0 [2022-08-29 23:23:22] INFO: Median overlap divergence: 0.164441 [2022-08-29 23:23:22] DEBUG: Sequence divergence distribution:

|                            *                    |                                                  
|                            *                    |                                                  
|                            *                    |                                                  
|                            *                    |                                                  
|                     *      *                    |                                                  
|                    **      *                    |                                                  
|                    **      *                    |                                                  
|                    **  *   *  *                 |                                                  
|                    ***** ***  *                 |                                                  
|                *  ****** ***  *                 |   * *                                            
|                ** ****** ***  *                 |  **** *                                          
|                ** ****** **** *       *         | ***** *                                          
|                ** *********** *   *   *       * ******* **                                         
|                ** *********** *   *   *       * ******* **                                         
|                ************** *   *   *       * ******* ***                                        
|                ****************  **  **       * ******* ***   *         *                          
|                ********************  *** *** ** ************ **  * *    *                          
|           *  *********************** *** *** ** ************ ***** ** ***                          
|           ** ******************************************************** **** *                       
|           ** ******************************************************** **** *                       
----------------------------------------------------------------------------------------------------
0%        5%        10%       15%       20%       25%       30%       35%       40%       45%       

Q25 = 0.12, Q50 = 0.16, Q75 = 0.27

[2022-08-29 23:29:27] INFO: Assembled 0 disjointigs [2022-08-29 23:29:27] INFO: Generating sequence [2022-08-29 23:29:27] DEBUG: Writing FASTA [2022-08-29 23:29:27] DEBUG: Peak RAM usage: 9 Gb -----------End assembly log------------ [2022-08-29 23:29:27] root: ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct [2022-08-29 23:29:27] root: ERROR: Pipeline aborted

`

mikolmogorov commented 1 year ago

@rajithadp If it is old PacBio data, I would try pbclip as described in the manual. Otherwise, I don't have any suggestions beyond that. I would also try other assemblers in case it is a specific Flye issue.

matteo1313 commented 1 year ago

Hello @fenderglass

Thank you for the amazing software <3

I am getting a similar issue. Tried to --meta and -g 1.6m --asm-coverage 50 and still no outputs. I have attached the log for both of the attempts listed above in this chat. What do you recommend? I am dealing with a bacteria dataset that I just ran through the latest version of Guppy. That is why I am using --nano-hq. Although I have run each --nano option and still hasn't run.

[2022-09-09 13:56:54] root: INFO: Starting Flye 2.9.1-b1780 [2022-09-09 13:56:54] root: DEBUG: Cmd: /usr/local/bin/flye --asm-coverage 50 --genome-size 1.7m --nano-hq /home/matteo/datasets/Lg/pass/lg_longest_10.fastq --out-dir /home/matteo/datasets/Lg/Flye [2022-09-09 13:56:54] root: DEBUG: Python version: 3.10.4 (main, Jun 29 2022, 12:14:53) [GCC 11.2.0] [2022-09-09 13:56:54] root: INFO: >>>STAGE: configure [2022-09-09 13:56:54] root: INFO: Configuring run [2022-09-09 13:56:54] root: INFO: Total read length: 858801 [2022-09-09 13:56:54] root: INFO: Input genome size: 1700000 [2022-09-09 13:56:54] root: INFO: Estimated coverage: 0 [2022-09-09 13:56:54] root: WARNING: Expected read coverage is 0, the assembly is not guaranteed to be optimal in this setting. Are you sure that the genome size was entered correctly? [2022-09-09 13:56:54] root: INFO: Reads N50/N90: 77551 / 71344 [2022-09-09 13:56:54] root: INFO: Minimum overlap set to 10000 [2022-09-09 13:56:54] root: INFO: >>>STAGE: assembly [2022-09-09 13:56:54] root: INFO: Assembling disjointigs [2022-09-09 13:56:54] root: DEBUG: -----Begin assembly log------ [2022-09-09 13:56:54] root: DEBUG: Running: flye-modules assemble --reads /home/matteo/datasets/Lg/pass/lg_longest_10.fastq --out-asm /home/matteo/datasets/Lg/Flye/00-assembly/draft_assembly.fasta --config /usr/local/lib/python3.10/dist-packages/flye/config/bin_cfg/asm_nano_hq.cfg --log /home/matteo/datasets/Lg/Flye/flye.log --threads 1 --genome-size 1700000 --min-ovlp 10000 [2022-09-09 13:56:54] DEBUG: Build date: Aug 17 2022 12:31:00 [2022-09-09 13:56:54] DEBUG: Total RAM: 31 Gb [2022-09-09 13:56:54] DEBUG: Available RAM: 27 Gb [2022-09-09 13:56:54] DEBUG: Total CPUs: 8 [2022-09-09 13:56:54] DEBUG: Loading /usr/local/lib/python3.10/dist-packages/flye/config/bin_cfg/asm_nano_hq.cfg [2022-09-09 13:56:54] DEBUG: Loading /usr/local/lib/python3.10/dist-packages/flye/config/bin_cfg/asm_defaults.cfg [2022-09-09 13:56:54] DEBUG: big_genome_threshold=29000000 [2022-09-09 13:56:54] DEBUG: meta_read_filter_kmer_freq=100 [2022-09-09 13:56:54] DEBUG: chain_large_gap_penalty=2 [2022-09-09 13:56:54] DEBUG: chain_small_gap_penalty=0.5 [2022-09-09 13:56:54] DEBUG: chain_gap_jump_threshold=100 [2022-09-09 13:56:54] DEBUG: max_coverage_drop_rate=5 [2022-09-09 13:56:54] DEBUG: max_extensions_drop_rate=5 [2022-09-09 13:56:54] DEBUG: chimera_window=100 [2022-09-09 13:56:54] DEBUG: chimera_overhang=1000 [2022-09-09 13:56:54] DEBUG: min_reads_in_disjointig=4 [2022-09-09 13:56:54] DEBUG: max_inner_reads=10 [2022-09-09 13:56:54] DEBUG: max_inner_fraction=0.25 [2022-09-09 13:56:54] DEBUG: max_separation=500 [2022-09-09 13:56:54] DEBUG: unique_edge_length=50000 [2022-09-09 13:56:54] DEBUG: min_repeat_res_support=0.51 [2022-09-09 13:56:54] DEBUG: out_paths_ratio=5 [2022-09-09 13:56:54] DEBUG: graph_cov_drop_rate=5 [2022-09-09 13:56:54] DEBUG: coverage_estimate_window=100 [2022-09-09 13:56:54] DEBUG: max_bubble_length=50000 [2022-09-09 13:56:54] DEBUG: loop_coverage_rate=1.5 [2022-09-09 13:56:54] DEBUG: repeat_edge_cov_mult=1.75 [2022-09-09 13:56:54] DEBUG: weak_detach_rate=5 [2022-09-09 13:56:54] DEBUG: tip_coverage_rate=2 [2022-09-09 13:56:54] DEBUG: tip_length_rate=2 [2022-09-09 13:56:54] DEBUG: output_gfa_before_rr=0 [2022-09-09 13:56:54] DEBUG: remove_alt_edges=0 [2022-09-09 13:56:54] DEBUG: low_cutoff_warning=0 [2022-09-09 13:56:54] DEBUG: kmer_size=17 [2022-09-09 13:56:54] DEBUG: use_minimizers=1 [2022-09-09 13:56:54] DEBUG: minimizer_window=5 [2022-09-09 13:56:54] DEBUG: reads_base_alignment=1 [2022-09-09 13:56:54] DEBUG: meta_read_top_kmer_rate=0.75 [2022-09-09 13:56:54] DEBUG: maximum_jump=1500 [2022-09-09 13:56:54] DEBUG: maximum_overhang=1500 [2022-09-09 13:56:54] DEBUG: repeat_kmer_rate=100 [2022-09-09 13:56:54] DEBUG: assemble_ovlp_divergence=0.05 [2022-09-09 13:56:54] DEBUG: assemble_divergence_relative=1 [2022-09-09 13:56:54] DEBUG: repeat_graph_ovlp_divergence=0.05 [2022-09-09 13:56:54] DEBUG: read_align_ovlp_divergence=0.10 [2022-09-09 13:56:54] DEBUG: hpc_scoring_on=1 [2022-09-09 13:56:54] DEBUG: add_unassembled_reads=0 [2022-09-09 13:56:54] DEBUG: extend_contigs_with_repeats=0 [2022-09-09 13:56:54] DEBUG: min_read_cov_cutoff=3 [2022-09-09 13:56:54] DEBUG: short_tip_length=20000 [2022-09-09 13:56:54] DEBUG: long_tip_length=100000 [2022-09-09 13:56:54] DEBUG: Running with k-mer size: 17 [2022-09-09 13:56:54] DEBUG: Running with minimum overlap 10000 [2022-09-09 13:56:54] DEBUG: Metagenome mode: N [2022-09-09 13:56:54] DEBUG: Short mode: N [2022-09-09 13:56:54] INFO: Reading sequences [2022-09-09 13:56:54] DEBUG: Building positional index [2022-09-09 13:56:54] DEBUG: Total sequence: 858801 bp [2022-09-09 13:56:54] INFO: Building minimizer index [2022-09-09 13:56:54] INFO: Pre-calculating index storage [2022-09-09 13:56:54] DEBUG: Mean k-mer frequency: 1.08596 [2022-09-09 13:56:54] DEBUG: Repetitive k-mer frequency: 108 [2022-09-09 13:56:54] DEBUG: Filtered 0 repetitive k-mers (0) [2022-09-09 13:56:54] INFO: Filling index [2022-09-09 13:56:54] DEBUG: Sorting k-mer index [2022-09-09 13:56:54] DEBUG: Selected k-mers: 263260 [2022-09-09 13:56:54] DEBUG: K-mer index size: 285891 [2022-09-09 13:56:54] DEBUG: Mean k-mer frequency: 1.08596 [2022-09-09 13:56:54] DEBUG: Minimizer rate: 3.00395 [2022-09-09 13:56:54] DEBUG: Peak RAM usage: 0 Gb [2022-09-09 13:56:54] DEBUG: Estimating k-mer identity bias [2022-09-09 13:57:16] DEBUG: Initial divergence estimate : 0.0481303 [2022-09-09 13:57:16] DEBUG: Relative threshold: Y [2022-09-09 13:57:16] DEBUG: Max divergence threshold set to 0.0981303 [2022-09-09 13:57:16] INFO: Extending reads [2022-09-09 13:57:16] DEBUG: Estimating overlap coverage [2022-09-09 13:57:16] INFO: Overlap-based coverage: 0 [2022-09-09 13:57:16] INFO: Median overlap divergence: 0.0481303 [2022-09-09 13:57:16] DEBUG: Sequence divergence distribution:

|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
----------------------------------------------------------------------------------------------------
0%        5%        10%       15%       20%       25%       30%       35%       40%       45%       

Q25 = 0.048, Q50 = 0.048, Q75 = 0.048

[2022-09-09 13:57:16] INFO: Assembled 0 disjointigs [2022-09-09 13:57:16] INFO: Generating sequence [2022-09-09 13:57:16] DEBUG: Building positional index [2022-09-09 13:57:17] DEBUG: Mean k-mer frequency: 0 [2022-09-09 13:57:17] DEBUG: Repetitive k-mer frequency: 0 [2022-09-09 13:57:17] DEBUG: Filtered 0 repetitive k-mers (-nan) [2022-09-09 13:57:17] DEBUG: Sorting k-mer index [2022-09-09 13:57:17] DEBUG: Selected k-mers: 0 [2022-09-09 13:57:17] DEBUG: K-mer index size: 0 [2022-09-09 13:57:17] DEBUG: Mean k-mer frequency: -nan [2022-09-09 13:57:17] DEBUG: Minimizer rate: -nan [2022-09-09 13:57:17] INFO: Filtering contained disjointigs [2022-09-09 13:57:17] DEBUG: Computing transitive closure for overlaps [2022-09-09 13:57:17] DEBUG: Found 0 overlaps [2022-09-09 13:57:17] DEBUG: Left 0 overlaps after filtering [2022-09-09 13:57:17] INFO: Contained seqs: 0 [2022-09-09 13:57:17] DEBUG: Writing FASTA [2022-09-09 13:57:17] DEBUG: Peak RAM usage: 0 Gb -----------End assembly log------------ [2022-09-09 13:57:17] root: ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct [2022-09-09 13:57:17] root: ERROR: Pipeline aborted [2022-09-09 13:57:50] root: INFO: Starting Flye 2.9.1-b1780 [2022-09-09 13:57:50] root: DEBUG: Cmd: /usr/local/bin/flye --meta --nano-hq /home/matteo/datasets/Lg/pass/lg_longest_10.fastq --out-dir /home/matteo/datasets/Lg/Flye [2022-09-09 13:57:50] root: DEBUG: Python version: 3.10.4 (main, Jun 29 2022, 12:14:53) [GCC 11.2.0] [2022-09-09 13:57:50] root: INFO: >>>STAGE: configure [2022-09-09 13:57:50] root: INFO: Configuring run [2022-09-09 13:57:50] root: INFO: Total read length: 858801 [2022-09-09 13:57:50] root: INFO: Reads N50/N90: 77551 / 71344 [2022-09-09 13:57:50] root: INFO: Minimum overlap set to 10000 [2022-09-09 13:57:50] root: INFO: >>>STAGE: assembly [2022-09-09 13:57:50] root: INFO: Assembling disjointigs [2022-09-09 13:57:50] root: DEBUG: -----Begin assembly log------ [2022-09-09 13:57:50] root: DEBUG: Running: flye-modules assemble --reads /home/matteo/datasets/Lg/pass/lg_longest_10.fastq --out-asm /home/matteo/datasets/Lg/Flye/00-assembly/draft_assembly.fasta --config /usr/local/lib/python3.10/dist-packages/flye/config/bin_cfg/asm_nano_hq.cfg --log /home/matteo/datasets/Lg/Flye/flye.log --threads 1 --meta --min-ovlp 10000 [2022-09-09 13:57:50] DEBUG: Build date: Aug 17 2022 12:31:00 [2022-09-09 13:57:50] DEBUG: Total RAM: 31 Gb [2022-09-09 13:57:50] DEBUG: Available RAM: 27 Gb [2022-09-09 13:57:50] DEBUG: Total CPUs: 8 [2022-09-09 13:57:50] DEBUG: Loading /usr/local/lib/python3.10/dist-packages/flye/config/bin_cfg/asm_nano_hq.cfg [2022-09-09 13:57:50] DEBUG: Loading /usr/local/lib/python3.10/dist-packages/flye/config/bin_cfg/asm_defaults.cfg [2022-09-09 13:57:50] DEBUG: big_genome_threshold=29000000 [2022-09-09 13:57:50] DEBUG: meta_read_filter_kmer_freq=100 [2022-09-09 13:57:50] DEBUG: chain_large_gap_penalty=2 [2022-09-09 13:57:50] DEBUG: chain_small_gap_penalty=0.5 [2022-09-09 13:57:50] DEBUG: chain_gap_jump_threshold=100 [2022-09-09 13:57:50] DEBUG: max_coverage_drop_rate=5 [2022-09-09 13:57:50] DEBUG: max_extensions_drop_rate=5 [2022-09-09 13:57:50] DEBUG: chimera_window=100 [2022-09-09 13:57:50] DEBUG: chimera_overhang=1000 [2022-09-09 13:57:50] DEBUG: min_reads_in_disjointig=4 [2022-09-09 13:57:50] DEBUG: max_inner_reads=10 [2022-09-09 13:57:50] DEBUG: max_inner_fraction=0.25 [2022-09-09 13:57:50] DEBUG: max_separation=500 [2022-09-09 13:57:50] DEBUG: unique_edge_length=50000 [2022-09-09 13:57:50] DEBUG: min_repeat_res_support=0.51 [2022-09-09 13:57:50] DEBUG: out_paths_ratio=5 [2022-09-09 13:57:50] DEBUG: graph_cov_drop_rate=5 [2022-09-09 13:57:50] DEBUG: coverage_estimate_window=100 [2022-09-09 13:57:50] DEBUG: max_bubble_length=50000 [2022-09-09 13:57:50] DEBUG: loop_coverage_rate=1.5 [2022-09-09 13:57:50] DEBUG: repeat_edge_cov_mult=1.75 [2022-09-09 13:57:50] DEBUG: weak_detach_rate=5 [2022-09-09 13:57:50] DEBUG: tip_coverage_rate=2 [2022-09-09 13:57:50] DEBUG: tip_length_rate=2 [2022-09-09 13:57:50] DEBUG: output_gfa_before_rr=0 [2022-09-09 13:57:50] DEBUG: remove_alt_edges=0 [2022-09-09 13:57:50] DEBUG: low_cutoff_warning=0 [2022-09-09 13:57:50] DEBUG: kmer_size=17 [2022-09-09 13:57:50] DEBUG: use_minimizers=1 [2022-09-09 13:57:50] DEBUG: minimizer_window=5 [2022-09-09 13:57:50] DEBUG: reads_base_alignment=1 [2022-09-09 13:57:50] DEBUG: meta_read_top_kmer_rate=0.75 [2022-09-09 13:57:50] DEBUG: maximum_jump=1500 [2022-09-09 13:57:50] DEBUG: maximum_overhang=1500 [2022-09-09 13:57:50] DEBUG: repeat_kmer_rate=100 [2022-09-09 13:57:50] DEBUG: assemble_ovlp_divergence=0.05 [2022-09-09 13:57:50] DEBUG: assemble_divergence_relative=1 [2022-09-09 13:57:50] DEBUG: repeat_graph_ovlp_divergence=0.05 [2022-09-09 13:57:50] DEBUG: read_align_ovlp_divergence=0.10 [2022-09-09 13:57:50] DEBUG: hpc_scoring_on=1 [2022-09-09 13:57:50] DEBUG: add_unassembled_reads=0 [2022-09-09 13:57:50] DEBUG: extend_contigs_with_repeats=0 [2022-09-09 13:57:50] DEBUG: min_read_cov_cutoff=3 [2022-09-09 13:57:50] DEBUG: short_tip_length=20000 [2022-09-09 13:57:50] DEBUG: long_tip_length=100000 [2022-09-09 13:57:50] DEBUG: Running with k-mer size: 17 [2022-09-09 13:57:50] DEBUG: Running with minimum overlap 10000 [2022-09-09 13:57:50] DEBUG: Metagenome mode: Y [2022-09-09 13:57:50] DEBUG: Short mode: N [2022-09-09 13:57:50] INFO: Reading sequences [2022-09-09 13:57:50] DEBUG: Building positional index [2022-09-09 13:57:50] DEBUG: Total sequence: 858801 bp [2022-09-09 13:57:50] INFO: Building minimizer index [2022-09-09 13:57:50] INFO: Pre-calculating index storage [2022-09-09 13:57:50] DEBUG: Mean k-mer frequency: 1.08596 [2022-09-09 13:57:50] DEBUG: Repetitive k-mer frequency: 108 [2022-09-09 13:57:50] DEBUG: Filtered 0 repetitive k-mers (0) [2022-09-09 13:57:50] INFO: Filling index [2022-09-09 13:57:50] DEBUG: Sorting k-mer index [2022-09-09 13:57:50] DEBUG: Selected k-mers: 263260 [2022-09-09 13:57:50] DEBUG: K-mer index size: 285891 [2022-09-09 13:57:50] DEBUG: Mean k-mer frequency: 1.08596 [2022-09-09 13:57:50] DEBUG: Minimizer rate: 3.00395 [2022-09-09 13:57:50] DEBUG: Peak RAM usage: 0 Gb [2022-09-09 13:57:50] DEBUG: Estimating k-mer identity bias [2022-09-09 13:58:14] DEBUG: Initial divergence estimate : 0.0481303 [2022-09-09 13:58:14] DEBUG: Relative threshold: Y [2022-09-09 13:58:14] DEBUG: Max divergence threshold set to 0.0981303 [2022-09-09 13:58:14] INFO: Extending reads [2022-09-09 13:58:14] DEBUG: Estimating overlap coverage [2022-09-09 13:58:14] INFO: Overlap-based coverage: 0 [2022-09-09 13:58:14] INFO: Median overlap divergence: 0.0481303 [2022-09-09 13:58:14] DEBUG: Sequence divergence distribution:

|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
----------------------------------------------------------------------------------------------------
0%        5%        10%       15%       20%       25%       30%       35%       40%       45%       

Q25 = 0.048, Q50 = 0.048, Q75 = 0.048

[2022-09-09 13:58:14] INFO: Assembled 0 disjointigs [2022-09-09 13:58:14] INFO: Generating sequence [2022-09-09 13:58:14] DEBUG: Building positional index [2022-09-09 13:58:14] DEBUG: Mean k-mer frequency: 0 [2022-09-09 13:58:14] DEBUG: Repetitive k-mer frequency: 0 [2022-09-09 13:58:14] DEBUG: Filtered 0 repetitive k-mers (-nan) [2022-09-09 13:58:14] DEBUG: Sorting k-mer index [2022-09-09 13:58:14] DEBUG: Selected k-mers: 0 [2022-09-09 13:58:14] DEBUG: K-mer index size: 0 [2022-09-09 13:58:14] DEBUG: Mean k-mer frequency: -nan [2022-09-09 13:58:14] DEBUG: Minimizer rate: -nan [2022-09-09 13:58:14] INFO: Filtering contained disjointigs [2022-09-09 13:58:14] DEBUG: Computing transitive closure for overlaps [2022-09-09 13:58:14] DEBUG: Found 0 overlaps [2022-09-09 13:58:14] DEBUG: Left 0 overlaps after filtering [2022-09-09 13:58:14] INFO: Contained seqs: 0 [2022-09-09 13:58:14] DEBUG: Writing FASTA [2022-09-09 13:58:14] DEBUG: Peak RAM usage: 0 Gb -----------End assembly log------------ [2022-09-09 13:58:14] root: ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct [2022-09-09 13:58:14] root: ERROR: Pipeline aborted

mikolmogorov commented 1 year ago

@matteo1313 Seems that you have ~800kb of reads for a bacteria of size 1.6Mb, so it simply not enough coverage to assemble. You typically need at least 10x, and 30x+ is recommended.

Also, your read N50 is 70kb, seems too good to be true for a bacteria - something might be wrong with the input data formatting.

Scott-Godwin commented 1 year ago

I'm also encountering this error. I'm running Flye as a plugin in Geneious Prime. My data consists of Nanopore reads generated from a cDNA library produced from RNA extracted from a cell culture infected with a virus. I'm trying to assemble the viral genome. I've filtered my reads by mapping against the host transcriptome, but this process is imperfect. I think that of the ~100,000 unmapped reads I have left, about 90% are viral. The virus has a segmented genome consisting of eight segments, with a total size of about 15 Kb. I've tried setting the genome size to various values including 15k, 100k and 2.4g (the approximate size of the host genome), but I keep getting the same error message.

ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct

Failed to run: C:\WINDOWS\System32\bash.exe -c '/mnt/c/Users/sgodwin/AppData/Local/Geneious/plugins/Flye/resources/Windows/bin/flye' --nano-corr input_0_Unpaired.fastq --threads 24 --genome-size 15k --meta --iterations 1 --out-dir out >stdout.txt 2>stderr.txt, exit code: 1

Flye reported the following errors: [2022-09-30 17:43:19] INFO: Starting Flye 2.7-b1585 [2022-09-30 17:43:19] INFO: >>>STAGE: configure [2022-09-30 17:43:19] INFO: Configuring run [2022-09-30 17:43:19] INFO: Total read length: 4464863 [2022-09-30 17:43:19] INFO: Input genome size: 15000 [2022-09-30 17:43:19] INFO: Estimated coverage: 297 [2022-09-30 17:43:19] INFO: Reads N50/N90: 699 / 191 [2022-09-30 17:43:19] INFO: Minimum overlap set to 1000 [2022-09-30 17:43:19] INFO: Selected k-mer size: 17 [2022-09-30 17:43:19] INFO: >>>STAGE: assembly [2022-09-30 17:43:19] INFO: Assembling disjointigs [2022-09-30 17:43:19] INFO: Reading sequences [2022-09-30 17:43:20] INFO: Generating solid k-mer index [2022-09-30 17:43:31] INFO: Counting k-mers (1/2): 00102030405060708090100% 0% 020% % 02030% % 0203040% % 020304050% % 02030405060% % 0203040506070% % 020304050607080% % 02030405060708090% % % [2022-09-30 17:43:31] INFO: Counting k-mers (2/2): 0% 506% % 604% 60% % 6040% % % 60% % % % % 80% 90% 100%

[2022-09-30 17:43:31] INFO: Filling index table (1/2) 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2022-09-30 17:43:31] INFO: Filling index table (2/2) 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2022-09-30 17:43:32] INFO: Extending reads [2022-09-30 17:43:51] INFO: Overlap-based coverage: 66 [2022-09-30 17:43:51] INFO: Median overlap divergence: 0.0697123 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2022-09-30 17:43:52] INFO: Assembled 0 disjointigs [2022-09-30 17:43:52] INFO: Generating sequence [2022-09-30 17:43:52] ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct [2022-09-30 17:43:52] ERROR: Pipeline aborted

mikolmogorov commented 1 year ago

@Scott-Godwin you are using outdated version of Flye. The latest release (2.9+) was optimized for viral assembly and should work better for you.

lucyintheskyzzz commented 1 year ago

I stopped using flye because it did not work on all my virus fastq files. What codes are people using for viral assembly now? I want to try it again. I remember by error was with the genome size.

thanks!

Scott-Godwin commented 1 year ago

@fenderglass Can I run Flye 2.9 from a bash terminal on a windows machine? I'm a wet lab guy. I'm a total beginner when it comes to all things bioinformatics.

tolot27 commented 1 year ago

@Scott-Godwin No, you can't. But you can install WSL (Windows System for Linux) and a Linux distribution like Ubuntu.

lucyintheskyzzz commented 1 year ago

Hi I uploaded the new version of Flye and I'am still getting "Pipeline aborted".
Thanks!

Also, do you know why Canu can assemble contigs with this fastq file but flye cannot?- I am trying to understand the theory behind different long-read de novo assemblers and why some can assemble, and some cannot, even though I am using the same fastq file.

Thanks!

flye --nano-raw barcode01.fastq --out-dir barcode01.flye --meta --threads 20 [2022-11-19 15:57:26] INFO: Starting Flye 2.9.1-b1780 [2022-11-19 15:57:26] INFO: >>>STAGE: configure [2022-11-19 15:57:26] INFO: Configuring run [2022-11-19 15:57:26] INFO: Total read length: 2427265 [2022-11-19 15:57:26] INFO: Reads N50/N90: 760 / 486 [2022-11-19 15:57:26] INFO: Minimum overlap set to 1000 [2022-11-19 15:57:26] INFO: >>>STAGE: assembly [2022-11-19 15:57:26] INFO: Assembling disjointigs [2022-11-19 15:57:26] INFO: Reading sequences [2022-11-19 15:59:56] INFO: Counting k-mers: 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2022-11-19 16:00:54] INFO: Filling index table (1/2) 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2022-11-19 16:00:54] INFO: Filling index table (2/2) 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2022-11-19 16:00:55] INFO: Extending reads [2022-11-19 16:00:56] INFO: Overlap-based coverage: 59 [2022-11-19 16:00:56] INFO: Median overlap divergence: 0.191868 0% 100% [2022-11-19 16:00:56] INFO: Assembled 0 disjointigs [2022-11-19 16:00:56] INFO: Generating sequence [2022-11-19 16:00:56] INFO: Filtering contained disjointigs [2022-11-19 16:00:57] INFO: Contained seqs: 0 [2022-11-19 16:00:57] ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct [2022-11-19 16:00:57] ERROR: Pipeline aborted (/lustre/project/taw/share/conda-envs/flye) [kvigil@cypress2 Fastq_Concat]$ flye --nano-raw barcode01.fastq --out-dir barcode01.flye --meta --threads 20 [2022-11-19 16:03:02] INFO: Starting Flye 2.9.1-b1780 [2022-11-19 16:03:02] INFO: >>>STAGE: configure [2022-11-19 16:03:02] INFO: Configuring run [2022-11-19 16:03:02] INFO: Total read length: 2427265 [2022-11-19 16:03:02] INFO: Reads N50/N90: 760 / 486 [2022-11-19 16:03:02] INFO: Minimum overlap set to 1000 [2022-11-19 16:03:02] INFO: >>>STAGE: assembly [2022-11-19 16:03:02] INFO: Assembling disjointigs [2022-11-19 16:03:02] INFO: Reading sequences [2022-11-19 16:05:35] INFO: Counting k-mers: 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2022-11-19 16:06:36] INFO: Filling index table (1/2) 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2022-11-19 16:06:36] INFO: Filling index table (2/2) 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2022-11-19 16:06:37] INFO: Extending reads [2022-11-19 16:06:38] INFO: Overlap-based coverage: 59 [2022-11-19 16:06:38] INFO: Median overlap divergence: 0.191868 0% 100% [2022-11-19 16:06:38] INFO: Assembled 0 disjointigs [2022-11-19 16:06:38] INFO: Generating sequence [2022-11-19 16:06:39] INFO: Filtering contained disjointigs [2022-11-19 16:06:39] INFO: Contained seqs: 0 [2022-11-19 16:06:39] ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct [2022-11-19 16:06:39] ERROR: Pipeline aborted

lucyintheskyzzz commented 1 year ago

Looks like my N50 is <1kb, so Flye can't assemble anything where the N50 is <1kb? What does N50 mean?

ChristopherRichie commented 1 year ago

https://en.wikipedia.org/wiki/N50,_L50,_and_related_statistics

N50[edithttps://en.wikipedia.org/w/index.php?title=N50,_L50,_and_related_statistics&action=edit&section=2] N50 statistic defines assembly quality in terms of contiguityhttps://en.wiktionary.org/wiki/contiguity. Given a set of contigs, the N50 is defined as the sequence length of the shortest contig at 50% of the total assembly length. It can be thought of as the point of half of the mass of the distribution; the number of baseshttps://en.wikipedia.org/wiki/Nucleotide from all contigs longer than the N50 will be close to the number of bases from all contigs shorter than the N50. For example, consider 9 contigs with the lengths 2,3,4,5,6,7,8,9,and 10; their sum is 54, half of the sum is 27, and the size of the genome also happens to be 54. 50% of this assembly would be 10 + 9 + 8 = 27 (half the length of the sequence). Thus the N50=8, which is the size of the contig which, along with the larger contigs, contain half of sequence of a particular genome. Note: When comparing N50 values from different assemblies, the assembly sizes must be the same size in order for N50 to be meaningful. N50 can be described as a weighted median statistic such that 50% of the entire assembly is contained in contigs or scaffolds equal to or larger than this value.

From: katie vigil @.> Sent: Wednesday, November 23, 2022 2:19 PM To: fenderglass/Flye @.> Cc: Richie, Christopher (NIH/NIDA) [E] @.>; Comment @.> Subject: [EXTERNAL] Re: [fenderglass/Flye] Flye does not generate any output ("No disjointigs were assembled" message) (#128)

Looks like my N50 is <1kb, so Flye can't assemble anything where the N50 is <1kb? What does N50 mean?

- Reply to this email directly, view it on GitHubhttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ffenderglass%2FFlye%2Fissues%2F128%23issuecomment-1325553733&data=05%7C01%7Cchrisr%40nida.nih.gov%7C749744c54dd64e3a793708dacd878fad%7C14b77578977342d58507251ca2dc2b06%7C0%7C0%7C638048279365438405%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ntXLaw9IQL1ZHcZ8kPk3wdqH1g3BML6lO1CKTsiCM5Y%3D&reserved=0, or unsubscribehttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAR4COQYPNLB7IQE5BZQJTGLWJZUZZANCNFSM4H22HVOQ&data=05%7C01%7Cchrisr%40nida.nih.gov%7C749744c54dd64e3a793708dacd878fad%7C14b77578977342d58507251ca2dc2b06%7C0%7C0%7C638048279365438405%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=0Ovc%2Fm%2FxKPK28wJt4Ha4WphelWpqIqGjKH5QlDfxvGY%3D&reserved=0. You are receiving this because you commented.Message ID: @.**@.>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and are confident the content is safe.

lucyintheskyzzz commented 1 year ago

@ChristopherRichie Thank you! I figured out that Metaflye is based on De Bruijn graph and Canu is an overlapping graph (OLC) based method.

jolespin commented 1 year ago

I've had this issue when using --nano-hq (my guppy version was 6.4.6+ae70e8f). When I changed the input to --nano-raw it ran to completion.

PavithraV0223 commented 1 year ago

Hello, I'm working with the Nanopore data, of the alpacas. I have tried all the different parameters but each run gives the same error. I'm unsure what the problem is. I have been using the adapter and barcode trimmed fastq file as an input to nano-raw. I have tried all the trouble shooting as mentioned above in the discussion but ending up with the same error. I have provided my log file for your reference. I have tried using the meta and the normal mode as well. You're help would be much appreciated.

2023-04-27 12:58:27] root: INFO: Starting Flye 2.9.2-b1786 [2023-04-27 12:58:27] root: DEBUG: Cmd: /home/pavi/miniconda3/bin/flye --nano-raw /home/pavi/flye/fitered3_MinIONadapt.fastq --out-dir ./flye_output [2023-04-27 12:58:27] root: DEBUG: Python version: 3.7.16 (default, Jan 17 2023, 22:20:44) [GCC 11.2.0] [2023-04-27 12:58:27] root: INFO: >>>STAGE: configure [2023-04-27 12:58:27] root: INFO: Configuring run [2023-04-27 12:58:28] root: INFO: Total read length: 252562133 [2023-04-27 12:58:28] root: INFO: Reads N50/N90: 1137 / 994 [2023-04-27 12:58:28] root: INFO: Minimum overlap set to 1000 [2023-04-27 12:58:28] root: INFO: >>>STAGE: assembly [2023-04-27 12:58:28] root: INFO: Assembling disjointigs [2023-04-27 12:58:28] root: DEBUG: -----Begin assembly log------ [2023-04-27 12:58:28] root: DEBUG: Running: flye-modules assemble --reads /home/pavi/flye/fitered3_MinIONadapt.fastq --out-asm /home/pavi/flye/flye_output/00-assembly/draft_assembly.fasta --config /home/pavi/miniconda3/lib/python3.7/site-packages/flye/config/bin_cfg/asm_raw_reads.cfg --log /home/pavi/flye/flye_output/flye.log --threads 1 --min-ovlp 1000 [2023-04-27 12:58:28] DEBUG: Build date: Mar 27 2023 14:17:04 [2023-04-27 12:58:28] DEBUG: Total RAM: 22 Gb [2023-04-27 12:58:28] DEBUG: Available RAM: 19 Gb [2023-04-27 12:58:28] DEBUG: Total CPUs: 7 [2023-04-27 12:58:28] DEBUG: Loading /home/pavi/miniconda3/lib/python3.7/site-packages/flye/config/bin_cfg/asm_raw_reads.cfg [2023-04-27 12:58:28] DEBUG: Loading /home/pavi/miniconda3/lib/python3.7/site-packages/flye/config/bin_cfg/asm_defaults.cfg [2023-04-27 12:58:28] DEBUG: big_genome_threshold=29000000 [2023-04-27 12:58:28] DEBUG: meta_read_filter_kmer_freq=100 [2023-04-27 12:58:28] DEBUG: chain_large_gap_penalty=2 [2023-04-27 12:58:28] DEBUG: chain_small_gap_penalty=0.5 [2023-04-27 12:58:28] DEBUG: chain_gap_jump_threshold=100 [2023-04-27 12:58:28] DEBUG: max_coverage_drop_rate=5 [2023-04-27 12:58:28] DEBUG: max_extensions_drop_rate=5 [2023-04-27 12:58:28] DEBUG: chimera_window=100 [2023-04-27 12:58:28] DEBUG: chimera_overhang=1000 [2023-04-27 12:58:28] DEBUG: min_reads_in_disjointig=4 [2023-04-27 12:58:28] DEBUG: max_inner_reads=10 [2023-04-27 12:58:28] DEBUG: max_inner_fraction=0.25 [2023-04-27 12:58:28] DEBUG: max_separation=500 [2023-04-27 12:58:28] DEBUG: unique_edge_length=50000 [2023-04-27 12:58:28] DEBUG: min_repeat_res_support=0.51 [2023-04-27 12:58:28] DEBUG: out_paths_ratio=5 [2023-04-27 12:58:28] DEBUG: graph_cov_drop_rate=5 [2023-04-27 12:58:28] DEBUG: coverage_estimate_window=100 [2023-04-27 12:58:28] DEBUG: max_bubble_length=50000 [2023-04-27 12:58:28] DEBUG: loop_coverage_rate=1.5 [2023-04-27 12:58:28] DEBUG: repeat_edge_cov_mult=1.75 [2023-04-27 12:58:28] DEBUG: weak_detach_rate=5 [2023-04-27 12:58:28] DEBUG: tip_coverage_rate=2 [2023-04-27 12:58:28] DEBUG: tip_length_rate=2 [2023-04-27 12:58:28] DEBUG: output_gfa_before_rr=0 [2023-04-27 12:58:28] DEBUG: remove_alt_edges=0 [2023-04-27 12:58:28] DEBUG: low_cutoff_warning=1 [2023-04-27 12:58:28] DEBUG: kmer_size=17 [2023-04-27 12:58:28] DEBUG: use_minimizers=0 [2023-04-27 12:58:28] DEBUG: reads_base_alignment=0 [2023-04-27 12:58:28] DEBUG: meta_read_top_kmer_rate=0.40 [2023-04-27 12:58:28] DEBUG: maximum_jump=1500 [2023-04-27 12:58:28] DEBUG: maximum_overhang=1500 [2023-04-27 12:58:28] DEBUG: repeat_kmer_rate=100 [2023-04-27 12:58:28] DEBUG: assemble_ovlp_divergence=0.10 [2023-04-27 12:58:28] DEBUG: assemble_divergence_relative=1 [2023-04-27 12:58:28] DEBUG: repeat_graph_ovlp_divergence=0.08 [2023-04-27 12:58:28] DEBUG: read_align_ovlp_divergence=0.25 [2023-04-27 12:58:28] DEBUG: hpc_scoring_on=0 [2023-04-27 12:58:28] DEBUG: add_unassembled_reads=0 [2023-04-27 12:58:28] DEBUG: extend_contigs_with_repeats=0 [2023-04-27 12:58:28] DEBUG: min_read_cov_cutoff=3 [2023-04-27 12:58:28] DEBUG: short_tip_length=20000 [2023-04-27 12:58:28] DEBUG: long_tip_length=100000 [2023-04-27 12:58:28] DEBUG: Running with k-mer size: 17 [2023-04-27 12:58:28] DEBUG: Running with minimum overlap 1000 [2023-04-27 12:58:28] DEBUG: Metagenome mode: N [2023-04-27 12:58:28] DEBUG: Short mode: N [2023-04-27 12:58:28] INFO: Reading sequences [2023-04-27 12:58:29] DEBUG: Building positional index [2023-04-27 12:58:29] DEBUG: Total sequence: 224735072 bp [2023-04-27 12:58:31] INFO: Counting k-mers: [2023-04-27 12:59:01] DEBUG: Updating k-mer histogram [2023-04-27 12:59:39] DEBUG: Hash size: 1033102 [2023-04-27 12:59:39] DEBUG: Total k-mers 40609435 [2023-04-27 12:59:39] INFO: Filling index table (1/2) [2023-04-27 13:00:49] DEBUG: Mean k-mer frequency: 340.156 [2023-04-27 13:00:49] DEBUG: Repetitive k-mer frequency: 34015 [2023-04-27 13:00:49] DEBUG: Filtered 28293692 repetitive k-mers (0.319157) [2023-04-27 13:00:49] INFO: Filling index table (2/2) [2023-04-27 13:01:59] DEBUG: Sorting k-mer index [2023-04-27 13:02:00] DEBUG: Selected k-mers: 354076 [2023-04-27 13:02:00] DEBUG: Index size: 60427371 [2023-04-27 13:02:00] DEBUG: Mean k-mer index frequency: 170.662 [2023-04-27 13:02:00] DEBUG: Peak RAM usage: 8 Gb [2023-04-27 13:02:00] DEBUG: Estimating k-mer identity bias [2023-04-27 13:04:53] DEBUG: Initial divergence estimate : 0.234128 [2023-04-27 13:04:53] DEBUG: Relative threshold: Y [2023-04-27 13:04:53] DEBUG: Max divergence threshold set to 0.334128 [2023-04-27 13:04:53] INFO: Extending reads [2023-04-27 13:04:53] DEBUG: Estimating overlap coverage [2023-04-27 13:07:48] INFO: Overlap-based coverage: 205 [2023-04-27 13:07:48] INFO: Median overlap divergence: 0.234818 [2023-04-27 13:07:48] DEBUG: Sequence divergence distribution:

| |
|
|
| |
| |
| |
| |
| **
* |
| * |
| ****
|
| ** |
| ** |
| * ****
|
| * |
| ** |
| ** |
| **** |
| ** |
| * * * *** |
| ***
|
| **| *

0% 5% 10% 15% 20% 25% 30% 35% 40% 45%

Q25 = 0.21, Q50 = 0.23, Q75 = 0.26 [2023-04-27 22:07:06] INFO: Assembled 0 disjointigs [2023-04-27 22:07:06] INFO: Generating sequence [2023-04-27 22:07:06] DEBUG: Building positional index [2023-04-27 22:07:06] DEBUG: Mean k-mer frequency: 0 [2023-04-27 22:07:06] DEBUG: Repetitive k-mer frequency: 0 [2023-04-27 22:07:06] DEBUG: Filtered 0 repetitive k-mers (-nan) [2023-04-27 22:07:06] DEBUG: Sorting k-mer index [2023-04-27 22:07:06] DEBUG: Selected k-mers: 0 [2023-04-27 22:07:06] DEBUG: K-mer index size: 0 [2023-04-27 22:07:06] DEBUG: Mean k-mer frequency: -nan [2023-04-27 22:07:06] DEBUG: Minimizer rate: -nan [2023-04-27 22:07:06] INFO: Filtering contained disjointigs [2023-04-27 22:07:06] DEBUG: Computing transitive closure for overlaps [2023-04-27 22:07:06] DEBUG: Found 0 overlaps [2023-04-27 22:07:06] DEBUG: Left 0 overlaps after filtering [2023-04-27 22:07:06] INFO: Contained seqs: 0 [2023-04-27 22:07:06] DEBUG: Writing FASTA [2023-04-27 22:07:06] DEBUG: Peak RAM usage: 8 Gb -----------End assembly log------------ [2023-04-27 22:07:06] root: ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct [2023-04-27 22:07:06] root: ERROR: Pipeline aborted

mikolmogorov commented 1 year ago

@PavithraV0223 could you tell more about your sample? And please attach a log with --meta run. In general, read length seems to be very short 1kb N50, is it some kind of amplicon sequencing?

emmannaemeka commented 1 year ago

Hello, I am having similar issues. I have tried the --meta mode and the --asm-coverage 50 without success.

[2023-05-22 09:29:27] root: INFO: Starting Flye 2.9-b1778 [2023-05-22 09:29:27] root: DEBUG: Cmd: /Users/pamluka/Desktop/programs_bioinformatics/Flye/bin/flye --meta --nano-raw /Users/pamluka/Desktop/UNGSM/sample_6/Sample-06-X-2022_fastq.fastq.gz -o /Users/pamluka/Desktop/UNGSM [2023-05-22 09:29:27] root: DEBUG: Python version: 3.6.15 | packaged by conda-forge | (default, Dec 3 2021, 18:49:43) [GCC Clang 11.1.0] [2023-05-22 09:29:27] root: INFO: >>>STAGE: configure [2023-05-22 09:29:27] root: INFO: Configuring run [2023-05-22 09:29:37] root: INFO: Total read length: 229301908 [2023-05-22 09:29:37] root: INFO: Reads N50/N90: 353 / 282 [2023-05-22 09:29:37] root: INFO: Minimum overlap set to 1000 [2023-05-22 09:29:37] root: INFO: >>>STAGE: assembly [2023-05-22 09:29:37] root: INFO: Assembling disjointigs [2023-05-22 09:29:37] root: DEBUG: -----Begin assembly log------ [2023-05-22 09:29:37] root: DEBUG: Running: flye-modules assemble --reads /Users/pamluka/Desktop/UNGSM/sample_6/Sample-06-X-2022_fastq.fastq.gz --out-asm /Users/pamluka/Desktop/UNGSM/00-assembly/draft_assembly.fasta --config /Users/pamluka/Desktop/programs_bioinformatics/Flye/flye/config/bin_cfg/asm_raw_reads.cfg --log /Users/pamluka/Desktop/UNGSM/flye.log --threads 1 --meta --min-ovlp 1000 [2023-05-22 09:29:37] DEBUG: Build date: Jun 7 2022 09:22:15 [2023-05-22 09:29:37] DEBUG: Total RAM: 16 Gb [2023-05-22 09:29:37] DEBUG: Available RAM: 0 Gb [2023-05-22 09:29:37] DEBUG: Total CPUs: 8 [2023-05-22 09:29:37] DEBUG: Loading /Users/pamluka/Desktop/programs_bioinformatics/Flye/flye/config/bin_cfg/asm_raw_reads.cfg [2023-05-22 09:29:37] DEBUG: Loading /Users/pamluka/Desktop/programs_bioinformatics/Flye/flye/config/bin_cfg/asm_defaults.cfg [2023-05-22 09:29:37] DEBUG: big_genome_threshold=29000000 [2023-05-22 09:29:37] DEBUG: meta_read_filter_kmer_freq=100 [2023-05-22 09:29:37] DEBUG: chain_large_gap_penalty=2 [2023-05-22 09:29:37] DEBUG: chain_small_gap_penalty=0.5 [2023-05-22 09:29:37] DEBUG: chain_gap_jump_threshold=100 [2023-05-22 09:29:37] DEBUG: max_coverage_drop_rate=5 [2023-05-22 09:29:37] DEBUG: max_extensions_drop_rate=5 [2023-05-22 09:29:37] DEBUG: chimera_window=100 [2023-05-22 09:29:37] DEBUG: chimera_overhang=1000 [2023-05-22 09:29:37] DEBUG: min_reads_in_disjointig=4 [2023-05-22 09:29:37] DEBUG: max_inner_reads=10 [2023-05-22 09:29:37] DEBUG: max_inner_fraction=0.25 [2023-05-22 09:29:37] DEBUG: max_separation=500 [2023-05-22 09:29:37] DEBUG: unique_edge_length=50000 [2023-05-22 09:29:37] DEBUG: min_repeat_res_support=0.51 [2023-05-22 09:29:37] DEBUG: out_paths_ratio=5 [2023-05-22 09:29:37] DEBUG: graph_cov_drop_rate=5 [2023-05-22 09:29:37] DEBUG: coverage_estimate_window=100 [2023-05-22 09:29:37] DEBUG: max_bubble_length=50000 [2023-05-22 09:29:37] DEBUG: loop_coverage_rate=1.5 [2023-05-22 09:29:37] DEBUG: repeat_edge_cov_mult=1.75 [2023-05-22 09:29:37] DEBUG: weak_detach_rate=5 [2023-05-22 09:29:37] DEBUG: tip_coverage_rate=2 [2023-05-22 09:29:37] DEBUG: tip_length_rate=2 [2023-05-22 09:29:37] DEBUG: output_gfa_before_rr=0 [2023-05-22 09:29:37] DEBUG: remove_alt_edges=0 [2023-05-22 09:29:37] DEBUG: low_cutoff_warning=1 [2023-05-22 09:29:37] DEBUG: kmer_size=17 [2023-05-22 09:29:37] DEBUG: use_minimizers=0 [2023-05-22 09:29:37] DEBUG: reads_base_alignment=0 [2023-05-22 09:29:37] DEBUG: meta_read_top_kmer_rate=0.40 [2023-05-22 09:29:37] DEBUG: maximum_jump=1500 [2023-05-22 09:29:37] DEBUG: maximum_overhang=1500 [2023-05-22 09:29:37] DEBUG: repeat_kmer_rate=100 [2023-05-22 09:29:37] DEBUG: assemble_ovlp_divergence=0.10 [2023-05-22 09:29:37] DEBUG: assemble_divergence_relative=1 [2023-05-22 09:29:37] DEBUG: repeat_graph_ovlp_divergence=0.08 [2023-05-22 09:29:37] DEBUG: read_align_ovlp_divergence=0.25 [2023-05-22 09:29:37] DEBUG: hpc_scoring_on=0 [2023-05-22 09:29:37] DEBUG: add_unassembled_reads=0 [2023-05-22 09:29:37] DEBUG: extend_contigs_with_repeats=0 [2023-05-22 09:29:37] DEBUG: min_read_cov_cutoff=3 [2023-05-22 09:29:37] DEBUG: short_tip_length=20000 [2023-05-22 09:29:37] DEBUG: long_tip_length=100000 [2023-05-22 09:29:37] DEBUG: Running with k-mer size: 17 [2023-05-22 09:29:37] DEBUG: Running with minimum overlap 1000 [2023-05-22 09:29:37] DEBUG: Metagenome mode: Y [2023-05-22 09:29:37] DEBUG: Short mode: N [2023-05-22 09:29:37] INFO: Reading sequences [2023-05-22 09:29:42] DEBUG: Building positional index [2023-05-22 09:29:42] DEBUG: Total sequence: 3440345 bp [2023-05-22 09:29:46] INFO: Counting k-mers: [2023-05-22 09:29:47] DEBUG: Updating k-mer histogram [2023-05-22 09:30:31] DEBUG: Hash size: 10893 [2023-05-22 09:30:31] DEBUG: Total k-mers 1848766 [2023-05-22 09:30:31] INFO: Filling index table (1/2) [2023-05-22 09:30:32] DEBUG: Mean k-mer frequency: 7.46855 [2023-05-22 09:30:32] DEBUG: Repetitive k-mer frequency: 746 [2023-05-22 09:30:32] DEBUG: Filtered 5983 repetitive k-mers (0.00455754) [2023-05-22 09:30:32] INFO: Filling index table (2/2) [2023-05-22 09:30:34] DEBUG: Sorting k-mer index [2023-05-22 09:30:34] DEBUG: Selected k-mers: 220513 [2023-05-22 09:30:34] DEBUG: Index size: 1350695 [2023-05-22 09:30:34] DEBUG: Mean k-mer index frequency: 6.12524 [2023-05-22 09:30:34] DEBUG: Peak RAM usage: 8 Gb [2023-05-22 09:30:34] DEBUG: Estimating k-mer identity bias [2023-05-22 09:30:35] DEBUG: Initial divergence estimate : 0.0703537 [2023-05-22 09:30:35] DEBUG: Relative threshold: Y [2023-05-22 09:30:35] DEBUG: Max divergence threshold set to 0.170354 [2023-05-22 09:30:35] INFO: Extending reads [2023-05-22 09:30:35] DEBUG: Estimating overlap coverage [2023-05-22 09:30:37] INFO: Overlap-based coverage: 1 [2023-05-22 09:30:37] INFO: Median overlap divergence: 0.0717406 [2023-05-22 09:30:37] DEBUG: Sequence divergence distribution:

|              *                   |                                                                 
|              *                   |                                                                 
|              *                   |                                                                 
|              *                   |                                                                 
|              *                   |                                                                 
|              *                   |                                                                 
|             **                   |                                                                 
|             **                   |                                                                 
|            ***                   |                                                                 
|           ****                   |                                                                 
|           ****                   |                                                                 
|           *****                  |                                                                 
|           *****                  |                                                                 
|           ******                 |                                                                 
|           ****** *               |                                                                 
|           ********               |                                                                 
|          *********               |                                                                 
|          *********  **           |                                                                 
|       *  ********* ***           |                 *                                               
|      ** ************** * *       | *           *   *  *   *                                        
----------------------------------------------------------------------------------------------------
0%        5%        10%       15%       20%       25%       30%       35%       40%       45%       

Q25 = 0.064, Q50 = 0.072, Q75 = 0.083

[2023-05-22 09:30:42] INFO: Assembled 0 disjointigs [2023-05-22 09:30:42] INFO: Generating sequence [2023-05-22 09:30:42] DEBUG: Building positional index [2023-05-22 09:30:42] DEBUG: Mean k-mer frequency: 0 [2023-05-22 09:30:42] DEBUG: Repetitive k-mer frequency: 0 [2023-05-22 09:30:42] DEBUG: Filtered 0 repetitive k-mers (nan) [2023-05-22 09:30:42] DEBUG: Sorting k-mer index [2023-05-22 09:30:42] DEBUG: Selected k-mers: 0 [2023-05-22 09:30:42] DEBUG: K-mer index size: 0 [2023-05-22 09:30:42] DEBUG: Mean k-mer frequency: nan [2023-05-22 09:30:42] DEBUG: Minimizer rate: nan [2023-05-22 09:30:42] INFO: Filtering contained disjointigs [2023-05-22 09:30:42] DEBUG: Computing transitive closure for overlaps [2023-05-22 09:30:42] DEBUG: Found 0 overlaps [2023-05-22 09:30:42] DEBUG: Left 0 overlaps after filtering [2023-05-22 09:30:42] INFO: Contained seqs: 0 [2023-05-22 09:30:42] DEBUG: Writing FASTA [2023-05-22 09:30:42] DEBUG: Peak RAM usage: 8 Gb -----------End assembly log------------ [2023-05-22 09:30:42] root: ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct [2023-05-22 09:30:42] root: ERROR: Pipeline aborted

mikolmogorov commented 1 year ago

@emmannaemeka seems like you're assembling very short reads, Flye really needs few kb reads to work.

miniluphy commented 3 months ago

I encountered a similar issue. Based on the latest version 2.9.3, when inputting the pacbio-hifi file, I received the following error message:

INFO: Overlap-based coverage: 0 ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct.

I have attached the log file.

Upon checking the fastq.gz file via pbclip, the result shows: Good: 152693 chopped: 5824 bad: 1897.

The -meta result is similar. How should I resolve this issue? Thank you flye.log flye-meta.log

mikolmogorov commented 3 months ago

@miniluphy your read error rate is ~13%, so this is not HiFi reads. If it is pacbio, use --pacbio-raw instead.