vplagnol / ExomeDepth

ExomeDepth R package for the detection of copy number variants in exomes and gene panels using high throughput DNA sequencing data.
65 stars 27 forks source link

read.width option default in getBamCounts #10

Open wiraki opened 6 years ago

wiraki commented 6 years ago

Hello,

I was hoping to discuss this specific option in the getBamCounts function, as it is generally not mentioned in the paper or in the vignette.

The description of the read.width option in the reference manual is:

numeric, maximum distance between the side of the target region and the middle of the paired read to include the paired read into that region.

As this is essentially a wrapper around the countBamInGRanges function from exomeCopy, here is its description from exomeCopy:

The width of a read, used in counting overlaps of mapped reads with the genomic ranges. The default is 1, resulting in the counting of only read starts in genomic ranges. If the length of fixed width reads is used, e.g. 100 for 100bp reads, then the function will return the count of all overlapping reads with the genomic ranges. However, counting all overlapping reads introduces dependency between the counts in adjacent windows.

It also says:

With the default setting (read.width=1), only the read starts are used for counting purposes (the leftmost position regardless of the strandedness of the read).

My questions/remarks are:

  1. There seems to be discordance as to whether the leftmost position or the middle of the read is considered when measuring the distance from a range.
  2. By using 300 as a default, if two adjacent targets are less that 600 bp apart, then reads might be counted twice, once for each adjacent target. Is this assumption correct? It seems quite important but is never mentioned anywhere else except the vignette.

Thanks a lot in advance!

vplagnol commented 6 years ago

Thank you for the questions. I am in the process of cleaning up ExomeDepth, something long overdue and your questions are much appreciated.

Regarding Q2, I do believe, indeed, that a read could be double counted, but then again this is what tests should demonstrate.

Re Q1, intuitively the middle of the read should be used. If I implemented otherwise, this should probably be fixed.

More soon.

AndreaG5 commented 2 years ago

Hello @mpauper,

did you get a satisfying answer to your question?! I am still doubtly on how to correctly use read.with parameter.

I appreciate any response!