vpc-ccg / haslr

A fast tool for hybrid genome assembly of long and short reads
GNU General Public License v3.0
74 stars 9 forks source link

Much Shorter Assembly than Expected #30

Open iek opened 2 years ago

iek commented 2 years ago

Hello, I'm using HASLR with nanopore and Illumina data to assemble a P. falciparum genome.

All nanopore data has around ~50x coverage. All Illumina short reads are set as 2 paired-end fastq files (the paired-end doesn't matter for haslr, I believe?)

I used this command: haslr.py -t 10 -o pfalciparum -g 23m -l nanopore_data.fasta nanopore -s illumina_data.fasta

The resulting asm.final.fa contains about 10 million base pairs, which is much shorter than the expected 23 million base pairs for plasmodium falciparum. I've run this on several different nanopore samples and gotten the same result: a much shorter assembly than expected.

Do you have any suggestions? Thank you so much.

andrzej-grz commented 2 years ago

Hello, I have similar problem. ONT reads with average x50 coverage with Raven and Smartdenovo gave assemblies 750-780M with BUSCOs above 98%. Using ONT and Illumina reads in HASLR resulted in much shorter assembly c. 580M. Did anybody solve this issue?