rust-bio / rust-bio-tools

A set of command line utilities based on Rust-Bio.
MIT License
182 stars 24 forks source link

feat: simulate reads from real bam files #181

Closed FelixMoelder closed 3 years ago

FelixMoelder commented 3 years ago

This PR adds a subcommand rbt simulate-reads allowing to derive artificial bam files from real ones. The user has to define a region from which reads will be drawn. (Only read pairs that are completely within the region will be considered.) For this region a random reference sequence will be created. Artificial reads will then be derived from this random reference by comparing the original read sequence to the reference (based on cigar operations). Mismatching bases are replaced by random ones that differ from the reference. The final reads contain random sequences with positions readjusted to the new reference. The following attributes will be adopted from the original read:

Creating artificial reads allows to provide test data often need for e.g. issueing bug reports without leaking any patient related information.