marbl / seqrequester

A tool for summarizing, extracting, generating and modifying DNA sequences.
23 stars 4 forks source link

seqrequester simulate missing end bases of chromosomes #1

Closed cjain7 closed 2 years ago

cjain7 commented 2 years ago

Hi Brian,

Seqrequester simulation is very useful and easy-to-use. However, irrespective of the sequencing coverage I specify, it is always missing coverage over a few initial and a few ending bases of chromosomes. I'd expect this doesn't happen in real sequencing. Can you please suggest why this may be happening? Is this related to the underlying algorithm being used? Any work-around for this?

--Chirag

brianwalenz commented 2 years ago

For those that didn't eavesdrop on our conversation (which would be everyone else in the world): This is caused by requiring a read be a specific length, then choosing a place to extract the read from.

I added a '-truncate' option that switches allows reads to be truncated by the end of the input sequence. This will cover the ends of sequences with reads, but they'll be shorter than the length distribution wants them to be. Internally, I'm still picking a length first, but then choosing a start position for the read from [-readLength .. sequenceLength], compared to [0 .. sequenceLength-readLength] used before.