vplagnol / ExomeDepth

ExomeDepth R package for the detection of copy number variants in exomes and gene panels using high throughput DNA sequencing data.
59 stars 26 forks source link

Deletions with more reads than expected #19

Open haraldgrove opened 5 years ago

haraldgrove commented 5 years ago

Hi I just noticed an issue where regions with more reads than expected are being called as deletions (reads.ratio >> 1). In the example there's also an issue with overlapping regions being called as both deletions and duplications. Anyone have any ideas on what might be happening here?

124085,124093,"deletion",9,55142239,55156999,"chr7","chrchr7:55142239-55156999",26.5,4474,17460,3.9
124085,124113,"duplication",20,55142239,55201432,"chr7","chrchr7:55142239-55201432",57.8,12671,45280,3.57
124085,124126,"deletion",13,55142239,55366179,"chr7","chrchr7:55142239-55366179",115,20719,81052,3.91
124085,124134,"duplication",8,55142239,55431346,"chr7","chrchr7:55142239-55431346",85.4,23742,90328,3.8
124085,124136,"deletion",2,55142239,55433799,"chr7","chrchr7:55142239-55433799",132,25303,97612,3.86
124085,124187,"duplication",51,55142239,55795890,"chr7","chrchr7:55142239-55795890",178,45707,146585,3.21

Best regards -Harald

vplagnol commented 5 years ago

That does not seem right! Do you have a plot perhaps showing the distribution of the read count over the chromosome region? Something seems to be confusing the HMM.

haraldgrove commented 5 years ago

BTN-062_EGFR_igv_snapshot Here's the read coverage in the region (max is 5230). I'm using 50 samples for reference, 49 has about 1/10th of the read depth and one sample has 1.5x the number of reads. The total number of reads is about the same for all samples. I'm using version 1.1.10 and same settings as in the tutorial.

cfrane commented 5 years ago

I have observed this happening too and from what I could determine it appeared to occur in areas where there is known paralogy.

However this also happened when testing a sample where there is a known duplication followed directly by a known deletion. ExomeDepth was able to call the duplication correctly and it also made a deletion call except it started at the same position as the duplication and continued to cover the known deletion. The resulting deletion call ended up with a BF of -642 and a ratio of 1.22.

If I subtract the expected and observe reads of the correctly called duplication from those of the deletion call which should just give the expected/observed values for the additional intervals, the ratio is 0.52 which supports the deletion call that I was expecting.

spooner1 commented 4 years ago

This can be solved by changing line 294 of class_definition.R from

my.calls$calls$start.p <- my.calls$calls$start.p -1

to

my.calls$calls$start.p <- my.calls$calls$end.p - my.calls$calls$nexons