mikolmogorov / Flye

De novo assembler for single molecule sequencing reads using repeat graphs
Other
787 stars 167 forks source link

Metagenomics assembly ERROR: No disjointigs were assembled #169

Closed barakova closed 5 years ago

barakova commented 5 years ago

Hi,

I have recently sequenced some metagenomic samples on MinION and now I am trying to assemble the data using Flye --meta option. This has worked for me previously on some samples but now it says "ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct". I tried with different genome size paramaters and still not working. Could you please advise me what I am doing wrong? I tried with different samples from the run and all of them error at the same part.

flye --nano-raw 1405-19.fastq.gz --meta -g 40.000m -o ./output -t 4 [2019-10-16 08:17:00] INFO: Starting Flye 2.6-release [2019-10-16 08:17:00] INFO: >>>STAGE: configure [2019-10-16 08:17:00] INFO: Configuring run [2019-10-16 08:17:24] INFO: Total read length: 1232086596 [2019-10-16 08:17:24] INFO: Input genome size: 40000000 [2019-10-16 08:17:24] INFO: Estimated coverage: 30 [2019-10-16 08:17:24] INFO: Reads N50/N90: 3211 / 2429 [2019-10-16 08:17:24] INFO: Minimum overlap set to 2000 [2019-10-16 08:17:24] INFO: Selected k-mer size: 17 [2019-10-16 08:17:24] INFO: >>>STAGE: assembly [2019-10-16 08:17:24] INFO: Assembling disjointigs [2019-10-16 08:17:24] INFO: Reading sequences [2019-10-16 08:17:40] INFO: Generating solid k-mer index [2019-10-16 08:21:45] INFO: Counting k-mers (1/2): 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2019-10-16 08:22:44] INFO: Counting k-mers (2/2): 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2019-10-16 08:25:20] INFO: Filling index table (1/2) 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2019-10-16 08:28:09] INFO: Filling index table (2/2) 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2019-10-16 08:31:00] INFO: Extending reads [2019-10-16 08:31:02] INFO: Overlap-based coverage: 8 [2019-10-16 08:31:02] INFO: Median overlap divergence: 0.18721 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2019-10-16 08:34:23] INFO: Assembled 0 disjointigs [2019-10-16 08:34:25] INFO: Generating sequence [2019-10-16 08:34:25] ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct

This is what the terminal says. I also include a log file.

flye.log

Thank you for your aswer. Alzbeta

mikolmogorov commented 5 years ago

Hi,

Hard to tell - it either could be an issue with Flye or you simply don't have enough overlap between reads to assemble disjointigs. Is it an environmental metagenome? If you are able to send me the data to fenderglass@gmail.com - I can take a look.

Best, Mikhail

barakova commented 5 years ago

Hi,

I have sent you an e-mail.

Thank you, Alzbeta

mikolmogorov commented 5 years ago

Hi Alzbeta,

Thank you for sending me the data. Looks like indeed there are almost no overlaps between reads - that is why nothing is assembled. Since it is an environmental metagenome it could be very big and you don't have sufficient coverage even for the most abundant bacteria to assemble.

Best, Mikhail

barakova commented 4 years ago

Hi Mikhail,

thank you for your answer. So you think there is nothing I can do about the samples now just to get more data to higher the coverage?

Alzbeta

From: Mikhail Kolmogorov notifications@github.com Sent: Tuesday, October 29, 2019 2:35 AM To: fenderglass/Flye Flye@noreply.github.com Cc: barakova barakova@vri.cz; Author author@noreply.github.com Subject: Re: [fenderglass/Flye] Metagenomics assembly ERROR: No disjointigs were assembled (#169)

Hi Alzbeta,

Thank you for sending me the data. Looks like indeed there are almost no overlaps between reads - that is why nothing is assembled. Since it is an environmental metagenome it could be very big and you don't have sufficient coverage even for the most abundant bacteria to assemble.

Best, Mikhail

— You are receiving this because you authored the thread. Reply to this email directly, https://github.com/fenderglass/Flye/issues/169?email_source=notifications&email_token=AMFAS6T4INQSB7SY5OR7MUDQQ6HLTA5CNFSM4JBIML5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECO54IA#issuecomment-547216928 view it on GitHub, or https://github.com/notifications/unsubscribe-auth/AMFAS6USHKQNL6CUBCUSSF3QQ6HLTANCNFSM4JBIML5A unsubscribe.

mikolmogorov commented 4 years ago

Regarding this sample that you have sent me previously - yes, I think so. It seems like there is not enough overlaps between reads to assemble.

pinkysaid commented 4 years ago

Hello, I am having a similar issue. Would you mind assisting me?

mikolmogorov commented 4 years ago

@pinkysaid Please describe your dataset and post flye.log file.

pinkysaid commented 4 years ago

**The dataset comprises compressed MinION demultiplexed reads (fastq) from a whole genome sequence of the Influenza A virus.

Please find appended the the log file;**

[2019-12-03 08:53:17] root: INFO: Starting Flye 2.6-release [2019-12-03 08:53:17] root: DEBUG: Cmd: flye --nano-raw barcode23.fastq.gz -g 0.0135m -o Flye_assembly -t 10 [2019-12-03 08:53:17] root: DEBUG: Python version: 3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0] [2019-12-03 08:53:17] root: INFO: >>>STAGE: configure [2019-12-03 08:53:17] root: INFO: Configuring run [2019-12-03 08:53:19] root: INFO: Total read length: 61823700 [2019-12-03 08:53:19] root: INFO: Input genome size: 13500 [2019-12-03 08:53:19] root: INFO: Estimated coverage: 4579 [2019-12-03 08:53:19] root: WARNING: Expected read coverage is 4579, the assembly is not guaranteed to be optimal in this setting. Are you sure that the genome size was entered correctly? [2019-12-03 08:53:19] root: INFO: Reads N50/N90: 1454 / 735 [2019-12-03 08:53:19] root: INFO: Minimum overlap set to 1000 [2019-12-03 08:53:19] root: INFO: Selected k-mer size: 15 [2019-12-03 08:53:19] root: INFO: >>>STAGE: assembly [2019-12-03 08:53:19] root: INFO: Assembling disjointigs [2019-12-03 08:53:19] root: DEBUG: -----Begin assembly log------ [2019-12-03 08:53:19] root: DEBUG: Running: flye-assemble --reads barcode23.fastq.gz --out-asm /Flye_assembly/00-assembly/draft_assembly.fasta --genome-size 13500 --config /home/python3.7/site-packages/flye/config/bin_cfg/asm_raw_reads.cfg --log Flye_assembly/flye.log --threads 10 --min-ovlp 1000 --kmer 15 [2019-12-03 08:53:19] DEBUG: Build date: Dec 2 2019 16:01:39 [2019-12-03 08:53:19] DEBUG: Total RAM: 503 Gb [2019-12-03 08:53:19] DEBUG: Available RAM: 476 Gb [2019-12-03 08:53:19] DEBUG: Total CPUs: 72 [2019-12-03 08:53:19] DEBUG: Parameters: [2019-12-03 08:53:19] DEBUG: big_genome_threshold=29000000 [2019-12-03 08:53:19] DEBUG: low_cutoff_warning=1 [2019-12-03 08:53:19] DEBUG: hard_min_coverage_rate=10 [2019-12-03 08:53:19] DEBUG: assemble_kmer_sample=1 [2019-12-03 08:53:19] DEBUG: repeat_graph_kmer_sample=1 [2019-12-03 08:53:19] DEBUG: read_align_kmer_sample=1 [2019-12-03 08:53:19] DEBUG: maximum_jump=1500 [2019-12-03 08:53:19] DEBUG: maximum_overhang=1500 [2019-12-03 08:53:19] DEBUG: repeat_kmer_rate=100 [2019-12-03 08:53:19] DEBUG: assemble_ovlp_relative_divergence=0.10 [2019-12-03 08:53:19] DEBUG: repeat_graph_ovlp_divergence=0.15 [2019-12-03 08:53:19] DEBUG: read_align_ovlp_divergence=0.25 [2019-12-03 08:53:19] DEBUG: max_coverage_drop_rate=5 [2019-12-03 08:53:19] DEBUG: chimera_window=100 [2019-12-03 08:53:19] DEBUG: min_reads_in_disjointig=4 [2019-12-03 08:53:19] DEBUG: max_inner_reads=10 [2019-12-03 08:53:19] DEBUG: max_inner_fraction=0.25 [2019-12-03 08:53:19] DEBUG: add_unassembled_reads=0 [2019-12-03 08:53:19] DEBUG: max_separation=500 [2019-12-03 08:53:19] DEBUG: unique_edge_length=50000 [2019-12-03 08:53:19] DEBUG: min_repeat_res_support=0.51 [2019-12-03 08:53:19] DEBUG: out_paths_ratio=5 [2019-12-03 08:53:19] DEBUG: graph_cov_drop_rate=5 [2019-12-03 08:53:19] DEBUG: coverage_estimate_window=100 [2019-12-03 08:53:19] DEBUG: extend_contigs_with_repeats=1 [2019-12-03 08:53:19] DEBUG: min_read_cov_cutoff=3 [2019-12-03 08:53:19] DEBUG: short_tip_length=10000 [2019-12-03 08:53:19] DEBUG: long_tip_length=100000 [2019-12-03 08:53:19] DEBUG: max_bubble_length=50000 [2019-12-03 08:53:19] DEBUG: Running with k-mer size: 15 [2019-12-03 08:53:19] DEBUG: Running with minimum overlap 1000 [2019-12-03 08:53:19] DEBUG: Metagenome mode: N [2019-12-03 08:53:19] INFO: Reading sequences [2019-12-03 08:53:20] DEBUG: Building positional index [2019-12-03 08:53:20] DEBUG: Total sequence: 61823700 bp [2019-12-03 08:53:20] DEBUG: Expected read coverage: 4579 [2019-12-03 08:53:20] INFO: Generating solid k-mer index [2019-12-03 08:53:20] DEBUG: Hard threshold set to 5 [2019-12-03 08:53:20] DEBUG: Started k-mer counting [2019-12-03 08:53:34] INFO: Counting k-mers (1/2): [2019-12-03 08:53:35] INFO: Counting k-mers (2/2): [2019-12-03 08:53:36] DEBUG: Estimated minimum kmer coverage: 574 [2019-12-03 08:53:36] DEBUG: Filtered 888957 erroneous k-mers [2019-12-03 08:53:36] DEBUG: Repetitive k-mer frequency: 178394 [2019-12-03 08:53:36] DEBUG: Filtered 0 repetitive k-mers (0) [2019-12-03 08:53:36] INFO: Filling index table [2019-12-03 08:53:36] DEBUG: Sampling rate: 1 [2019-12-03 08:53:36] DEBUG: Solid k-mers: 13503 [2019-12-03 08:53:36] DEBUG: K-mer index size: 24090364 [2019-12-03 08:53:36] DEBUG: Mean k-mer frequency: 1784.07 [2019-12-03 08:53:37] DEBUG: Sorting k-mer index [2019-12-03 08:53:38] DEBUG: Peak RAM usage: 1 Gb [2019-12-03 08:53:38] DEBUG: Estimating k-mer identity bias [2019-12-03 08:53:56] DEBUG: Median overlap divergence: 0.121838 [2019-12-03 08:53:56] DEBUG: K-mer estimate bias: -0.00395411 [2019-12-03 08:53:56] DEBUG: Max divergence threshold set to 0.221838 [2019-12-03 08:53:56] INFO: Extending reads [2019-12-03 08:53:56] DEBUG: Estimating overlap coverage [2019-12-03 08:56:28] INFO: Overlap-based coverage: 1342 [2019-12-03 08:56:28] INFO: Median overlap divergence: 0.0777594 [2019-12-03 08:56:28] DEBUG: Sequence divergence distribution:

|          * *                               |                                                       
|         ** * *                             |                                                       
|         ** * *                             |                                                       
|         ** * **                            |                                                       
|         ** ****                            |                                                       
|         ** ****                            |                                                       
|         *******                            |                                                       
|         ******* *                          |                                                       
|         *********                          |                                                       
|         **********                         |                                                       
|         ***********                        |                                                       
|        ************                        |                                                       
|        ************                        |                                                       
|       **************** *    *              |                                                       
|       **************** *    *              |                                                       
|       *******************   *              |                                                       
|       ********************  *     *      * |                                                       
|      *********************  *    **      * |                                                       
|      *********************************   * |*    *     * *                                         
|     ************************************ * **    * * * * ** ****                                   
----------------------------------------------------------------------------------------------------
0%        5%        10%       15%       20%       25%       30%       35%       40%       45%       

Q25 = 0.058, Q50 = 0.078, Q75 = 0.11

[2019-12-03 09:06:28] INFO: Assembled 0 disjointigs [2019-12-03 09:06:28] INFO: Generating sequence [2019-12-03 09:06:28] DEBUG: Writing FASTA [2019-12-03 09:06:29] DEBUG: Peak RAM usage: 1 Gb -----------End assembly log------------ [2019-12-03 09:06:29] root: ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct

mikolmogorov commented 4 years ago

@pinkysaid Thanks. Flye is not designed for viral assembly, since those sequences are so short and often covered by a single read.

pinkysaid commented 4 years ago

Thank you Mikhail for the feedback and clarification. Much appreciated.