loneknightpy / idba

124 stars 53 forks source link

How to use IDBA-UD to perform scaffolding on assembled contigs? #63

Open rzhan186 opened 3 years ago

rzhan186 commented 3 years ago

Dear IDBA-UD developers,

I came across some literature where people have used IDBA-UD to perform a scaffolding step on assembled contigs from other assemblers such as MEGAHIT. It's not quite clear how it's supposed to be done from reading the IDBA-UD help page, and I am thinking of using the following code to perform the task, but not sure if it's appropriate. idba --read merged_raw_reads.fa --read_level_2 megahit_contigs.fa --out idba_scaffolds.fa

Could you help with this? Much appreciated!

Rui

jarrodscott commented 2 years ago

Hi @rzhan186

Did you ever find an answer to this question? I came across this idea in a paper by He et al where they state. Assembled contigs were then scaffolded using the scaffolding function from IDBA-UD63 (v.1.1.3). If you found a way of doing this I would your insight :)

rzhan186 commented 2 years ago

Hi @jarrodscott, I actually read the same paper 😂 then I emailed the author for clarification. The author said I have to use this script from IDBA-UD. So what I did is just clone the whole IDBA repository, then change to the bin directory and perform the following:

./scaffold -o $out_dir contigs.fa reads_paired.fa --num_threads 1

However, when I check the resulting scaffolds' statistics, they were exactly the same as the contigs', I was hoping it could improve the assembly quality, but it didn't work on my data. Feel free to try it out, let me know how it goes! (I might have just done something wrong...)

jarrodscott commented 2 years ago

Thanks @rzhan186 !!! Curious, I just ran this test using a single sample (R1 & R2 fastq files merged with fq2fa) on a small co-assembly of 4 samples (generated with MEGAHIT) and the result was also the same. Perhaps I am missing something?

rzhan186 commented 2 years ago

Hi @jarrodscott thanks for sharing your result! Yeah, this is a bit strange. I am not sure if you've tried to run IDBA-UD from scratch on your raw reads? From what I know, the software outputs both contigs and scaffolds, maybe you can run it on your raw reads and compare it with the MEGAHIT results, then move on with the best one.

I hope the author can pop up someday and clear up our confusion 😆

jarrodscott commented 2 years ago

Hi @rzhan186. Good question. Indeed, I have tried running IDBA-UD from scratch on the raw reads. Once I filter the scaffolds output from IDBA_UD and remove sequences < 1kbs, the results were comparable to the MEGAHIT meta-sensitive assembly (also with min length set at 1kbps). It certainly would be nice to find a hybrid approach to improve assemblies :)