Open chris-kreitzer opened 3 years ago
Unfortunately I don't remember a lot of this code, since I wrote it a while ago (and also I don't remember much of the BAM format), but I can try to help:
My understanding is that the refs/alts/error/deletion columns are all counts. You can see where they are tallied in the code here: https://github.com/mskcc/htstools/blob/d43300b0820d8e531df190afa3f8c10cd903e097/snp-pileup.cpp#L395-L403
The deletion count comes from the is_del
property of htslib's pileup output, which according to its documentation means "the base on the padded read is a deletion". If a base is not a deletion, and it doesn't match the REF or ALT fields of the row in the VCF file, then it's counted as an error.
I'm not sure what the err.thresh and del.thresh properties you are referring to are, but it looks like that's some postprocessing done by FACETS: https://github.com/mskcc/facets/blob/59835bbe818a810a58789d29c0027975ffc1bd69/R/facets-wrapper.R#L10-L20
Hopefully that made sense, let me know if you have any other questions!
Hi Alex,
Quick question about your
snp-pileup.h
function you wrote many years ago for MSKCC. The output tables containFile1/2E
andFile1/2D
columns (errors, and deletions).What is the actual meaning behind those columns? What does e.g. '100' mean for File1E (or D) at position XY?
The question constitutes on the fact that when FACETS is loading the count matrix generated by
snp-pileup
, it essentially considers every position regardless of errors or deletions?readSnpMatrix = function(filename, skip=0L, err.thresh=Inf, del.thresh=Inf, perl.pileup=FALSE){ }
Here, as err.thresh & del.thresh are set to Inf (default behavior of FACETS), no single position is discarded if any number is provided (and hence my question what those columns stand for).
Many many thanks for your help,
Best wishes, chris.