Closed barakova closed 5 years ago
Hi,
Hard to tell - it either could be an issue with Flye or you simply don't have enough overlap between reads to assemble disjointigs. Is it an environmental metagenome? If you are able to send me the data to fenderglass@gmail.com - I can take a look.
Best, Mikhail
Hi,
I have sent you an e-mail.
Thank you, Alzbeta
Hi Alzbeta,
Thank you for sending me the data. Looks like indeed there are almost no overlaps between reads - that is why nothing is assembled. Since it is an environmental metagenome it could be very big and you don't have sufficient coverage even for the most abundant bacteria to assemble.
Best, Mikhail
Hi Mikhail,
thank you for your answer. So you think there is nothing I can do about the samples now just to get more data to higher the coverage?
Alzbeta
From: Mikhail Kolmogorov notifications@github.com Sent: Tuesday, October 29, 2019 2:35 AM To: fenderglass/Flye Flye@noreply.github.com Cc: barakova barakova@vri.cz; Author author@noreply.github.com Subject: Re: [fenderglass/Flye] Metagenomics assembly ERROR: No disjointigs were assembled (#169)
Hi Alzbeta,
Thank you for sending me the data. Looks like indeed there are almost no overlaps between reads - that is why nothing is assembled. Since it is an environmental metagenome it could be very big and you don't have sufficient coverage even for the most abundant bacteria to assemble.
Best, Mikhail
— You are receiving this because you authored the thread. Reply to this email directly, https://github.com/fenderglass/Flye/issues/169?email_source=notifications&email_token=AMFAS6T4INQSB7SY5OR7MUDQQ6HLTA5CNFSM4JBIML5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECO54IA#issuecomment-547216928 view it on GitHub, or https://github.com/notifications/unsubscribe-auth/AMFAS6USHKQNL6CUBCUSSF3QQ6HLTANCNFSM4JBIML5A unsubscribe.
Regarding this sample that you have sent me previously - yes, I think so. It seems like there is not enough overlaps between reads to assemble.
Hello, I am having a similar issue. Would you mind assisting me?
@pinkysaid Please describe your dataset and post flye.log
file.
**The dataset comprises compressed MinION demultiplexed reads (fastq) from a whole genome sequence of the Influenza A virus.
Please find appended the the log file;**
[2019-12-03 08:53:17] root: INFO: Starting Flye 2.6-release [2019-12-03 08:53:17] root: DEBUG: Cmd: flye --nano-raw barcode23.fastq.gz -g 0.0135m -o Flye_assembly -t 10 [2019-12-03 08:53:17] root: DEBUG: Python version: 3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0] [2019-12-03 08:53:17] root: INFO: >>>STAGE: configure [2019-12-03 08:53:17] root: INFO: Configuring run [2019-12-03 08:53:19] root: INFO: Total read length: 61823700 [2019-12-03 08:53:19] root: INFO: Input genome size: 13500 [2019-12-03 08:53:19] root: INFO: Estimated coverage: 4579 [2019-12-03 08:53:19] root: WARNING: Expected read coverage is 4579, the assembly is not guaranteed to be optimal in this setting. Are you sure that the genome size was entered correctly? [2019-12-03 08:53:19] root: INFO: Reads N50/N90: 1454 / 735 [2019-12-03 08:53:19] root: INFO: Minimum overlap set to 1000 [2019-12-03 08:53:19] root: INFO: Selected k-mer size: 15 [2019-12-03 08:53:19] root: INFO: >>>STAGE: assembly [2019-12-03 08:53:19] root: INFO: Assembling disjointigs [2019-12-03 08:53:19] root: DEBUG: -----Begin assembly log------ [2019-12-03 08:53:19] root: DEBUG: Running: flye-assemble --reads barcode23.fastq.gz --out-asm /Flye_assembly/00-assembly/draft_assembly.fasta --genome-size 13500 --config /home/python3.7/site-packages/flye/config/bin_cfg/asm_raw_reads.cfg --log Flye_assembly/flye.log --threads 10 --min-ovlp 1000 --kmer 15 [2019-12-03 08:53:19] DEBUG: Build date: Dec 2 2019 16:01:39 [2019-12-03 08:53:19] DEBUG: Total RAM: 503 Gb [2019-12-03 08:53:19] DEBUG: Available RAM: 476 Gb [2019-12-03 08:53:19] DEBUG: Total CPUs: 72 [2019-12-03 08:53:19] DEBUG: Parameters: [2019-12-03 08:53:19] DEBUG: big_genome_threshold=29000000 [2019-12-03 08:53:19] DEBUG: low_cutoff_warning=1 [2019-12-03 08:53:19] DEBUG: hard_min_coverage_rate=10 [2019-12-03 08:53:19] DEBUG: assemble_kmer_sample=1 [2019-12-03 08:53:19] DEBUG: repeat_graph_kmer_sample=1 [2019-12-03 08:53:19] DEBUG: read_align_kmer_sample=1 [2019-12-03 08:53:19] DEBUG: maximum_jump=1500 [2019-12-03 08:53:19] DEBUG: maximum_overhang=1500 [2019-12-03 08:53:19] DEBUG: repeat_kmer_rate=100 [2019-12-03 08:53:19] DEBUG: assemble_ovlp_relative_divergence=0.10 [2019-12-03 08:53:19] DEBUG: repeat_graph_ovlp_divergence=0.15 [2019-12-03 08:53:19] DEBUG: read_align_ovlp_divergence=0.25 [2019-12-03 08:53:19] DEBUG: max_coverage_drop_rate=5 [2019-12-03 08:53:19] DEBUG: chimera_window=100 [2019-12-03 08:53:19] DEBUG: min_reads_in_disjointig=4 [2019-12-03 08:53:19] DEBUG: max_inner_reads=10 [2019-12-03 08:53:19] DEBUG: max_inner_fraction=0.25 [2019-12-03 08:53:19] DEBUG: add_unassembled_reads=0 [2019-12-03 08:53:19] DEBUG: max_separation=500 [2019-12-03 08:53:19] DEBUG: unique_edge_length=50000 [2019-12-03 08:53:19] DEBUG: min_repeat_res_support=0.51 [2019-12-03 08:53:19] DEBUG: out_paths_ratio=5 [2019-12-03 08:53:19] DEBUG: graph_cov_drop_rate=5 [2019-12-03 08:53:19] DEBUG: coverage_estimate_window=100 [2019-12-03 08:53:19] DEBUG: extend_contigs_with_repeats=1 [2019-12-03 08:53:19] DEBUG: min_read_cov_cutoff=3 [2019-12-03 08:53:19] DEBUG: short_tip_length=10000 [2019-12-03 08:53:19] DEBUG: long_tip_length=100000 [2019-12-03 08:53:19] DEBUG: max_bubble_length=50000 [2019-12-03 08:53:19] DEBUG: Running with k-mer size: 15 [2019-12-03 08:53:19] DEBUG: Running with minimum overlap 1000 [2019-12-03 08:53:19] DEBUG: Metagenome mode: N [2019-12-03 08:53:19] INFO: Reading sequences [2019-12-03 08:53:20] DEBUG: Building positional index [2019-12-03 08:53:20] DEBUG: Total sequence: 61823700 bp [2019-12-03 08:53:20] DEBUG: Expected read coverage: 4579 [2019-12-03 08:53:20] INFO: Generating solid k-mer index [2019-12-03 08:53:20] DEBUG: Hard threshold set to 5 [2019-12-03 08:53:20] DEBUG: Started k-mer counting [2019-12-03 08:53:34] INFO: Counting k-mers (1/2): [2019-12-03 08:53:35] INFO: Counting k-mers (2/2): [2019-12-03 08:53:36] DEBUG: Estimated minimum kmer coverage: 574 [2019-12-03 08:53:36] DEBUG: Filtered 888957 erroneous k-mers [2019-12-03 08:53:36] DEBUG: Repetitive k-mer frequency: 178394 [2019-12-03 08:53:36] DEBUG: Filtered 0 repetitive k-mers (0) [2019-12-03 08:53:36] INFO: Filling index table [2019-12-03 08:53:36] DEBUG: Sampling rate: 1 [2019-12-03 08:53:36] DEBUG: Solid k-mers: 13503 [2019-12-03 08:53:36] DEBUG: K-mer index size: 24090364 [2019-12-03 08:53:36] DEBUG: Mean k-mer frequency: 1784.07 [2019-12-03 08:53:37] DEBUG: Sorting k-mer index [2019-12-03 08:53:38] DEBUG: Peak RAM usage: 1 Gb [2019-12-03 08:53:38] DEBUG: Estimating k-mer identity bias [2019-12-03 08:53:56] DEBUG: Median overlap divergence: 0.121838 [2019-12-03 08:53:56] DEBUG: K-mer estimate bias: -0.00395411 [2019-12-03 08:53:56] DEBUG: Max divergence threshold set to 0.221838 [2019-12-03 08:53:56] INFO: Extending reads [2019-12-03 08:53:56] DEBUG: Estimating overlap coverage [2019-12-03 08:56:28] INFO: Overlap-based coverage: 1342 [2019-12-03 08:56:28] INFO: Median overlap divergence: 0.0777594 [2019-12-03 08:56:28] DEBUG: Sequence divergence distribution:
| * * |
| ** * * |
| ** * * |
| ** * ** |
| ** **** |
| ** **** |
| ******* |
| ******* * |
| ********* |
| ********** |
| *********** |
| ************ |
| ************ |
| **************** * * |
| **************** * * |
| ******************* * |
| ******************** * * * |
| ********************* * ** * |
| ********************************* * |* * * *
| ************************************ * ** * * * * ** ****
----------------------------------------------------------------------------------------------------
0% 5% 10% 15% 20% 25% 30% 35% 40% 45%
Q25 = 0.058, Q50 = 0.078, Q75 = 0.11
[2019-12-03 09:06:28] INFO: Assembled 0 disjointigs [2019-12-03 09:06:28] INFO: Generating sequence [2019-12-03 09:06:28] DEBUG: Writing FASTA [2019-12-03 09:06:29] DEBUG: Peak RAM usage: 1 Gb -----------End assembly log------------ [2019-12-03 09:06:29] root: ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct
@pinkysaid Thanks. Flye is not designed for viral assembly, since those sequences are so short and often covered by a single read.
Thank you Mikhail for the feedback and clarification. Much appreciated.
Hi,
I have recently sequenced some metagenomic samples on MinION and now I am trying to assemble the data using Flye --meta option. This has worked for me previously on some samples but now it says "ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct". I tried with different genome size paramaters and still not working. Could you please advise me what I am doing wrong? I tried with different samples from the run and all of them error at the same part.
flye --nano-raw 1405-19.fastq.gz --meta -g 40.000m -o ./output -t 4 [2019-10-16 08:17:00] INFO: Starting Flye 2.6-release [2019-10-16 08:17:00] INFO: >>>STAGE: configure [2019-10-16 08:17:00] INFO: Configuring run [2019-10-16 08:17:24] INFO: Total read length: 1232086596 [2019-10-16 08:17:24] INFO: Input genome size: 40000000 [2019-10-16 08:17:24] INFO: Estimated coverage: 30 [2019-10-16 08:17:24] INFO: Reads N50/N90: 3211 / 2429 [2019-10-16 08:17:24] INFO: Minimum overlap set to 2000 [2019-10-16 08:17:24] INFO: Selected k-mer size: 17 [2019-10-16 08:17:24] INFO: >>>STAGE: assembly [2019-10-16 08:17:24] INFO: Assembling disjointigs [2019-10-16 08:17:24] INFO: Reading sequences [2019-10-16 08:17:40] INFO: Generating solid k-mer index [2019-10-16 08:21:45] INFO: Counting k-mers (1/2): 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2019-10-16 08:22:44] INFO: Counting k-mers (2/2): 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2019-10-16 08:25:20] INFO: Filling index table (1/2) 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2019-10-16 08:28:09] INFO: Filling index table (2/2) 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2019-10-16 08:31:00] INFO: Extending reads [2019-10-16 08:31:02] INFO: Overlap-based coverage: 8 [2019-10-16 08:31:02] INFO: Median overlap divergence: 0.18721 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2019-10-16 08:34:23] INFO: Assembled 0 disjointigs [2019-10-16 08:34:25] INFO: Generating sequence [2019-10-16 08:34:25] ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct
This is what the terminal says. I also include a log file.
flye.log
Thank you for your aswer. Alzbeta