This PR adds a subcommand rbt simulate-reads allowing to derive artificial bam files from real ones.
The user has to define a region from which reads will be drawn. (Only read pairs that are completely within the region will be considered.)
For this region a random reference sequence will be created.
Artificial reads will then be derived from this random reference by comparing the original read sequence to the reference (based on cigar operations).
Mismatching bases are replaced by random ones that differ from the reference.
The final reads contain random sequences with positions readjusted to the new reference.
The following attributes will be adopted from the original read:
read name
base qualities
flags
CIGAR-String
insert size
MC tag
Creating artificial reads allows to provide test data often need for e.g. issueing bug reports without leaking any patient related information.
This PR adds a subcommand
rbt simulate-reads
allowing to derive artificial bam files from real ones. The user has to define a region from which reads will be drawn. (Only read pairs that are completely within the region will be considered.) For this region a random reference sequence will be created. Artificial reads will then be derived from this random reference by comparing the original read sequence to the reference (based on cigar operations). Mismatching bases are replaced by random ones that differ from the reference. The final reads contain random sequences with positions readjusted to the new reference. The following attributes will be adopted from the original read:Creating artificial reads allows to provide test data often need for e.g. issueing bug reports without leaking any patient related information.