Question about “simulated read sequences are randomly sampled from a reference sequence”

Hello!

I was reading the PBSIM readme page and have a question regarding the following description:

To run model-based simulation:

pbsim --data-type CLR --depth 20 --model_qc data/model_qc_clr sample/sample.fasta

In the example above, simulated read sequences are randomly sampled from a reference sequence ("sample/sample.fasta") and differences (errors) of the sampled reads are introduced.

I was trying to understand what exactly the phrase “simulated read sequences are randomly sampled from a reference sequence” means. If I use a transcriptome.fasta that contains 500 transcripts to generate the simulated reads, would all 500 transcripts in the transcriptome.fasta generate the simulated reads? Or does it mean that only a subset of the 500 transcripts (e.g. 250 transcripts) that are randomly sampled from the transcriptome.fasta generate the simulated reads?

Thank you very much for your help!

pfaucon / PBSIM-PacBio-Simulator

Question about “simulated read sequences are randomly sampled from a reference sequence” #17