kishwarshafin / pepper

PEPPER-Margin-DeepVariant
MIT License
241 stars 42 forks source link

Depth differences between r0.5 and r0.8 #166

Closed VLeducq closed 1 year ago

VLeducq commented 2 years ago

Hello,

I ran the r0.8 version on data previously analyzed with the r0.5 version. I systematically get a ~10x lower coverage across the entire genome for all my samples (from ~100X with r0.5 to ~10X with r0.8). Because of this, I have difficulty detecting variants in regions with lower coverage. Is there an explanation related to the r0.8 version ? The data is the same. Same BAM files.

Thank you !

kishwarshafin commented 1 year ago

@VLeducq ,

There's been a lot of parameter updates between r0.5 and r0.8. So a lot of reads with lower mapqs or baseqs would not make the depth. So this difference is expected.

VLeducq commented 1 year ago

Thank you for your answer. Therefore, is it still correct to call variants on a sample with an average 10X coverage ?

kishwarshafin commented 1 year ago

I would say yes, our analysis shows higher accuracy at 10x with new version. One thing I would suggest is that if you working with a diploid genome then you expect 5x per haplotype with 10x. With that and the error-rate of the sequencing device, it becomes difficult to call variants accurately. We generally suggest between 15x-20x to get a set of variants that are better suitable for high-quality downstream analysis.