mikolmogorov / Flye

De novo assembler for single molecule sequencing reads using repeat graphs
Other
789 stars 168 forks source link

high coverage assembly #156

Closed aistBMRG closed 5 years ago

aistBMRG commented 5 years ago

Hi,

We have several deeply sequenced bacterial genomes, pacbio reads with coverage up to 500x. Would it be possible to clarify whether such high coverage be an issue? Or is it beneficial to run with such high coverage to achieve highly accurate polishing?

Thanks!

Dieter

raysully commented 5 years ago

Dieter,

Our lab has had great Flye results with 1000X (more or less) bacterial genome (ONT data) in a couple of hours using 26 threads of a 72 core cpu server using --asm-coverage 50 -i 4 --plasmids. Hope this is helpful.

Ray

aistBMRG commented 5 years ago

Great, thanks! Giving it a shot with those settings.

Dieter.

aistBMRG commented 5 years ago

Small update -- Using --asm-coverage 50 yielded an incomplete assembly (that is, non-circular chromosome). May I asked why you restrict the --asm-coverage, it is to reduce memory usage, or that improves assembly for your data?

Thanks.

Dieter.

mikolmogorov commented 5 years ago

Hi,

Reasonable --asm-coverage value should not affect the assmebly contiguity (but improve speed and memory consumption). It limits the coverage for draft disjointig assembly, but all reads are used during repeat resolution. Could you post / send me assembly_graph.gv and assemby_info.txt files so I can comment?

aistBMRG commented 5 years ago

Thanks for chiming in. I am sending the files to you by email. It would be great to understand what is going on.

Dieter.

aistBMRG commented 5 years ago

Please find attached the requested files; assemblies generated using --asm-coverage 50 and with --asm-coverage unset as per the default.

Regards,

Dieter

From: Mikhail Kolmogorov notifications@github.com Reply-To: fenderglass/Flye reply@reply.github.com Date: Wednesday, September 25, 2019 at 1:42 AM To: fenderglass/Flye Flye@noreply.github.com Cc: TourlousseDieter dieter.tourlousse@aist.go.jp, Author author@noreply.github.com Subject: Re: [fenderglass/Flye] high coverage assembly (#156)

Hi,

Reasonable --asm-coverage value should not affect the assmebly contiguity (but improve speed and memory consumption). It limits the coverage for draft disjointig assembly, but all reads are used during repeat resolution. Could you post / send me assembly_graph.gv and assemby_info.txt files so I can comment?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/fenderglass/Flye/issues/156?email_source=notifications&email_token=AC2EV73OFB67H5ZM4HL4N73QLI7QLA5CNFSM4IXYLTT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7PAJ7Q#issuecomment-534643966, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AC2EV726N5GMTD3WOLN5DTTQLI7QLANCNFSM4IXYLTTQ.

seq_name length cov. circ. repeat mult. graph_path

contig_1 488006 281 - - 1 ,1, contig_2 156664 254 - - 1 ,2, contig_3 71045 254 - - 1 ,3, contig_4 84307 209 - - 1 ,4, contig_5 51691 298 - - 1 ,5, contig_6 123209 238 - - 1 ,6, contig_7 153295 257 - - 1 ,7, contig_8 146805 287 - - 1 ,8, contig_9 21954 957 + + 4 9 contig_10 171268 253 - - 1 ,10, contig_11 447141 291 - - 1 ,11, contig_12 113106 295 - - 1 ,12, contig_13 97186 238 - - 1 ,13, contig_14 146487 277 - - 1 ,14, contig_15 167723 302 - - 1 ,15, contig_16 26973 464 + - 2 16

seq_name length cov. circ. repeat mult. graph_path

contig_1 21956 953 + + 4 1 contig_2 19546 426 + - 1 2 contig_3 2502757 295 + - 1 3 contig_4 26976 438 + - 2 4

mikolmogorov commented 5 years ago

Hi Dieter,

Yeas, indeed it looks like the second run was better. Could you please send me the flye.log files from both runs to fednerglass@gmail.com? In the mean time, feel free to use the second run assemblies as the final result.

Mikhail

mikolmogorov commented 5 years ago

Hi Dieter,

Thanks for sending me the logs. I think --asm-coverage 50 is failing because for some reason a good portion of the reads has low quality (high error rate), and Flye fails to align them. As a result, instead of the desired 50x coverage, you get much less and the assembly is fragmented. You can indirectly see this by (i) Flye only aligned ~38% of reads on the graph (usually it's close to 100) and (ii) Final read alignment identity reported by minimap2 is ~20% (usually 13-14% for PacBio).

Nevertheless, there were enough good reads in your set for complete assembly (without --asm-coverage). Perhaps, some quality filtering might help for your future assemblies.

aistBMRG commented 5 years ago

Thanks for the clarification; we will QC the reads more strictly in the future.

Dieter

From: Mikhail Kolmogorov notifications@github.com Reply-To: fenderglass/Flye reply@reply.github.com Date: Friday, September 27, 2019 at 3:33 AM To: fenderglass/Flye Flye@noreply.github.com Cc: TourlousseDieter dieter.tourlousse@aist.go.jp, Author author@noreply.github.com Subject: Re: [fenderglass/Flye] high coverage assembly (#156)

Hi Dieter,

Thanks for sending me the logs. I think --asm-coverage 50 is failing because for some reason a good portion of the reads has low quality (high error rate), and Flye fails to align them. As a result, instead of the desired 50x coverage, you get much less and the assembly is fragmented. You can indirectly see this by (i) Flye only aligned ~38% of reads on the graph (usually it's close to 100) and (ii) Final read alignment identity reported by minimap2 is ~20% (usually 13-14% for PacBio).

Nevertheless, there were enough good reads in your set for complete assembly (without --asm-coverage). Perhaps, some quality filtering might help for your future assemblies.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/fenderglass/Flye/issues/156?email_source=notifications&email_token=AC2EV7ZRZTJHE3QAQZNKNALQLT577A5CNFSM4IXYLTT2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7WRNMI#issuecomment-535631537, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AC2EV755EWHTGWACVZEHNJTQLT577ANCNFSM4IXYLTTQ.