Closed RNieuwenhuis closed 4 years ago
Hi , I cannot check this problem without having a hand one subset of bam and one variant .
ah got it ! My bad: I re-counted the DP for each read group in the same BAM. That's why the DP was so high. Thank you for pointing this ! Tell me if that works now please.
Successful bug fix.
Thanks, and also thank you in general for your efforts in the bioinformatics community as a whole!
Subject of the issue
Hi,
I have 100 samples in my vcf named 01..100 and want to fix the genotypes using FixVcfMissingGenotypes . To do this I split up my vcf in parts of 50K lines and then run
java -jar fixvcfmissinggenotypes.jar --bams Bams.list -o "Filled/${NUMBER}.vcf" --filter "record.getMappingQuality()<20 || record.getDuplicateReadFlag() || record.getReadFailsVendorQualityCheckFlag() || record.isSecondaryOrSupplementary()" "Part_${NUMBER}.vcf"
The contents of Bams.list are absolute paths to symlinks. This is because the original bam filename does not directly match the sample name. So
Bams.list
contains:The output shows consistently much higher values for DP than expected. Alignments show depth between 20 - 40 consistently whereas for all fixed genotypes it is ranging from 100 - 2000.
Example_output.txt
I suspect I misunderstood how the program works with regards to linking the correct bam to the samples. I have for most samples multiple read groups in each bam because they come from multiple sequencing runs. My understanding was that if a sample is called
<sample>
if will look for the genotype in<sample>.bam
. The help of--partition
is unclear to me but seems to be of use for solving this problem.In 01.bam I have for instance multiple read groups that:
Your environment
20191222-64-gd9efbd3
1.8.0_202
${JAVA_HOME}
/cm/shared/apps/java/jdk1.8.0_202/jre
REDHAT_BUGZILLA_PRODUCT="Scientific Linux 7" REDHAT_BUGZILLA_PRODUCT_VERSION=7.7 REDHAT_SUPPORT_PRODUCT="Scientific Linux" REDHAT_SUPPORT_PRODUCT_VERSION="7.7"