tkrahn / extract23

Extract a simulated 23andMe (V3) style file from a Whole Genome BAM file
GNU General Public License v3.0
28 stars 5 forks source link

question #7

Closed Griz054 closed 7 years ago

Griz054 commented 7 years ago

I could have sent an e-mail but I figured I'd ask the question here. I'm working with some folks that read my article about this entire process. One question that came up - in the script why didn't you just use the vcf to populate the data instead of grabbing the larger bam file?

tkrahn commented 7 years ago

Sorry for my late reply. The VCF may not tell me if a certain SNP is covered by sequencing reads. It may be covered by reads and just match the reference sequence or it may not be covered meaning a nocall.

So you'd at least need additional information from a BED file. Yet the BED file doesn't tell you the quality of the reads at each position, it just defines sections of the genome that are defined trust-worthy. This may have its quirks if just a single base in the middle of a BED region has quality issues. So using BED files comes with a lot of complicated questions. Therefore I have decided that the cleanest way is to directly score the SNPs from the BAM file.

Griz054 commented 7 years ago

Thank you. I knew you had a reason. I just couldn't tell him what it was.

Jim "Griz" Adams

On May 5, 2017 2:55 PM, "tkrahn" notifications@github.com wrote:

Sorry for my late reply. The VCF may not tell me if a certain SNP is covered by sequencing reads. It may be covered by reads and just match the reference sequence or it may not be covered meaning a nocall.

So you'd at least need additional information from a BED file. Yet the BED file doesn't tell you the quality of the reads at each position, it just defines sections of the genome that are defined trust-worthy. This may have its quirks if just a single base in the middle of a BED region has quality issues. So using BED files comes with a lot of complicated questions. Therefore I have decided that the cleanest way is to directly score the SNPs from the BAM file.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/tkrahn/extract23/issues/7#issuecomment-299547690, or mute the thread https://github.com/notifications/unsubscribe-auth/AYPDKj_alHl-rE1L_8Xi7QjuZUhjxx5zks5r23CrgaJpZM4NFjln .