lh3 / wgsim

Reads simulator
258 stars 91 forks source link

Incorrect number of reads generated #13

Open elgartmi opened 8 years ago

elgartmi commented 8 years ago

I am simulating reads from non-complete bacterial genomes. They tend to have a lot of short contigs. For example see : Lactobacillus malefermentans KCTC 3548.

So each time the program tries to get a read from such contig it correclty outputs : [wgsim_core] skip sequence 'gi|338736693|dbj|BACN01000170.1|' as it is shorter than 500!

However, each time it outputs this, a read that should have gotten into output file is skipped. So in a file with many such short contigs, the resulting file has much fewer reads than specified via -N X.

As a workaround I as it to generate more reads and then keep the top X with "head -n X*4". However, its a bug I believe :)