I was reading the PBSIM readme page and have a question regarding the following description:
To run model-based simulation:
pbsim --data-type CLR --depth 20 --model_qc data/model_qc_clr sample/sample.fasta
In the example above, simulated read sequences are randomly sampled from a reference sequence ("sample/sample.fasta") and differences (errors) of the sampled reads are introduced.
I was trying to understand what exactly the phrase “simulated read sequences are randomly sampled from a reference sequence” means. If I use a transcriptome.fasta that contains 500 transcripts to generate the simulated reads, would all 500 transcripts in the transcriptome.fasta generate the simulated reads? Or does it mean that only a subset of the 500 transcripts (e.g. 250 transcripts) that are randomly sampled from the transcriptome.fasta generate the simulated reads?
Hello!
I was reading the PBSIM readme page and have a question regarding the following description:
I was trying to understand what exactly the phrase “simulated read sequences are randomly sampled from a reference sequence” means. If I use a
transcriptome.fasta
that contains 500 transcripts to generate the simulated reads, would all 500 transcripts in thetranscriptome.fasta
generate the simulated reads? Or does it mean that only a subset of the 500 transcripts (e.g. 250 transcripts) that are randomly sampled from thetranscriptome.fasta
generate the simulated reads?Thank you very much for your help!