Low read depth at start and end of sequence

nh13 / DWGSIM

Whole Genome Simulator for Next-Generation Sequencing

GNU General Public License v2.0

92 stars 36 forks source link

Low read depth at start and end of sequence #71

Closed nick-pestell closed 2 years ago

nick-pestell commented 2 years ago

Read depth appears to be as low as 0 at the start and end of the sequence and increases moving away from the ends.

This seems to be not representative of our real-world data. Perhaps this is because the genome is actually circular, thus reads wrap around the start/end?

Is it possible to simulate this behaviour with dwgsim?

nh13 commented 2 years ago

@nick-pestell I don't think I've added any support for circular contigs/genomes. I'd have to have a lot more information about the inputs and command line you're using. Thanks!

nick-pestell commented 2 years ago

Thanks @nh13.

The command we're running is.

dwgsim -e 0 -E 0 -i -d 330 -N 144997 -1 150 -2 150 -r 0 -R 0 -X 0 -y 0.01 -H -z 1 <in.ref.fa> <out.prefix>

in.ref.fa is bacterial genome sequence of about 4Mbp to which we have introduced 16000 random SNPs using simuG.

nh13 commented 2 years ago

If we assume a linear genome, which may not be correct for your application, then there are far fewer possible start positions for inserts that overlap the first base then the tenth base (only one for the former, and ten for the latter, and so on. I think that explains the reduced depths at the start and end of the contigs.

nick-pestell commented 2 years ago

Ok, thanks @nh13 , I think that makes sense to me. Closing.

andersgs commented 2 years ago

It would be great to have support for circular genomes (eg, bacterial chromosomes, plasmids, organelle genomes).

One approach might be to create multiple copies of the input FASTA, but taking 10Kb from the start and adding it to the end of the sequence. Then generating some proportion of the reads from each of the copies. Just a thought. There might be better ways of doing it.

nh13 commented 2 years ago

Concatenate two copies, then discard a read pair if it is wholly contained in the second copy. Voila!

AlsoATraveler commented 1 year ago

How to make the beginning and end of the sequence the same depth as the middle？

nh13 commented 1 year ago

@AlsoATraveler you asked this as well in #81. Please read the above explanation. Locking the conversation as I believe over answered it.