vpc-ccg / haslr

A fast tool for hybrid genome assembly of long and short reads
GNU General Public License v3.0
74 stars 9 forks source link

minimum coverage requirements? #2

Closed mjmillerlab closed 4 years ago

mjmillerlab commented 4 years ago

Do you have a sense of minimum coverage required. It seems that you tried for 50X long read coverage in your manuscript. And other long read assemblers want 30-60X. But what about 10X long-read coverage? Any chance that might work?

haghshenas commented 4 years ago

Hello,

By default, HASLR uses longest 25x of long reads (i.e. it downsamples if the coverage is higher). However, it shouldn't complain about low coverage. Technically it requires only 3 support to connect two short read contigs in the backbone graph; so you can try it.

I haven't done a thorough test with low coverage datasets... still I would suggest you to try it once with default parameters and a second time with --edge-sup 2 and compare the results.

Note that when you run it with different parameters pass the same output directory. This will be faster because some of the required files already exist.

Let me know if you think I can help more.

AmaliT commented 4 years ago

Hi @haghshenas How about the coverage requirement for short read? Given that it uses short reads to generate SRC, does that mean a higher SR coverage is required compared to long read? Also could you feed in multiple insert sizes of short reads?

haghshenas commented 4 years ago

Hi @AmaliT. Yes, I would say a "medium" depth of coverage for short reads is necessary. The lowest SR coverage I tested was ~40x. Yes, since minia doesn't care about insert size (and uses all SRs as single end read) it should work with short read datasets with different insert sizes.

haghshenas commented 4 years ago

@AmaliT, I want to clarify that you should pass all short read files after -s separated by space.

AmaliT commented 4 years ago

@haghshenas thanks. Will give it a try. Cheers