Open xmy1990 opened 3 months ago
Xmy:
The golden BAM is generated at the time of the simulation. It contains information about locations on the chromosomes where reads were taken from and orientation in which they were taken, at the time of the simulation.
The aligner will not output the same BAM, ever. First in the regions with redundant sequences the aligner would not be able to resolve the redundancy and place the reads according to the settings you choose, frequently choosing the top alignment in the list of equivalent alignments. Second, the aligner uses random seed during its process, unless you choose the seed to use. Thus, alignment output will be different from one run to another. Third, depending on the quality of sequencing you chose to simulate (sequencing error rate), the aligner may or may not be able to place the reads correctly.
So, many factors are involved.
If you are looking to figure out how the simulator works and validate that it will be appropriate for you, I suggest you start with a small area of the genome of interest, which does not contain redundant regions. Then simulate with low number of mutations and low sequencing error rate. That will make validation easier. Finally, make sure you adjust parameters in the aligner so that you can control the randomness in the alignment process.
Thanks.
Hi,
Could you plainly explain how a golden BAM is generated? I’ve noticed that the BAM file obtained using fq files and BWA differs significantly from the golden BAM.
Thanks