mskcc / htstools

5 stars 3 forks source link

The meaning of error and deletion columns in snp-pileup output table #7

Open chris-kreitzer opened 3 years ago

chris-kreitzer commented 3 years ago

Hi Alex,

Quick question about your snp-pileup.h function you wrote many years ago for MSKCC. The output tables contain File1/2E and File1/2D columns (errors, and deletions).

What is the actual meaning behind those columns? What does e.g. '100' mean for File1E (or D) at position XY?

The question constitutes on the fact that when FACETS is loading the count matrix generated by snp-pileup, it essentially considers every position regardless of errors or deletions?

readSnpMatrix = function(filename, skip=0L, err.thresh=Inf, del.thresh=Inf, perl.pileup=FALSE){ }

Here, as err.thresh & del.thresh are set to Inf (default behavior of FACETS), no single position is discarded if any number is provided (and hence my question what those columns stand for).

Many many thanks for your help,

Best wishes, chris.

thatoddmailbox commented 3 years ago

Unfortunately I don't remember a lot of this code, since I wrote it a while ago (and also I don't remember much of the BAM format), but I can try to help:

My understanding is that the refs/alts/error/deletion columns are all counts. You can see where they are tallied in the code here: https://github.com/mskcc/htstools/blob/d43300b0820d8e531df190afa3f8c10cd903e097/snp-pileup.cpp#L395-L403

The deletion count comes from the is_del property of htslib's pileup output, which according to its documentation means "the base on the padded read is a deletion". If a base is not a deletion, and it doesn't match the REF or ALT fields of the row in the VCF file, then it's counted as an error.

I'm not sure what the err.thresh and del.thresh properties you are referring to are, but it looks like that's some postprocessing done by FACETS: https://github.com/mskcc/facets/blob/59835bbe818a810a58789d29c0027975ffc1bd69/R/facets-wrapper.R#L10-L20

Hopefully that made sense, let me know if you have any other questions!