yukiteruono / pbsim3

PBSIM3: a simulator for all types of PacBio and ONT long reads
GNU General Public License v2.0
65 stars 5 forks source link

Generating simulated HiFi reads for a specific region #6

Closed Jong-hun-Park closed 1 year ago

Jong-hun-Park commented 1 year ago

Hi @yukiteruono,

Thanks for developing this excellent tool. I just wanted to ask if my approach sounds right to you.

I would like to generate ~30X simulated HiFi reads for a specific region of a genome. The region may be longer than the standard HiFi read lengths. In this case, I prepared an input FASTA containing the sequence of interests but was not sure if I should use "template sequencing" or "WGS". Could you explain when I should "template sequencing mode"? I also noticed that I can not specify the expected coverage with "template sequencing" mode. What combinations of parameters would be appropriate for this kind of situation? Is there any other parameters that I should be particularly aware of?

Thanks, Jonghun

yukiteruono commented 1 year ago

Thank you for your using PBSIM3.

In template mode, a simulated read is the full length of the input sequence, not just a portion of it. If such a sequencing situation is expected, template mode is recommended. In template mode, one input sequence, one simulated read. If you input 30 copies of the same sequence in one FASTA file, you will get 30 simulated reads, each with different introduced errors. If your specific region is longer than the HiFi read, WGS mode may be better. In WGS mode, the coverage depth at the edges of the input sequence is low, so the input should be the specific region plus the flanking sequences.

Jong-hun-Park commented 1 year ago

Thanks for the detailed answer. It is really helpful and I will try what you suggested.