Closed hollygene closed 4 years ago
Updates from 3/19 to present: (Had this in a Word doc, but will update on here from now on)
3/19/20
3/26/20
3/27/20
File Length D0_FullCohort.vcf 1595 Raw variants D0_reducedGEM.vcf 1042 Removed low-mappabiltiy sites D0_reducedGEM_DpGr10_Fil.vcf 827 Removed sites with depth <10 D0_reducedGEM_DpGr10_Fil_AncCalls.vcf 811 Removed sites with ancestor no-calls (./.) D0_reducedGEM_DpGr10_Fil_AncCalls_NoHets.vcf 664 Removed sites with ancestor genotype as Heterozygous
Alternative method: D0_FullCohort.vcf 1595 Raw variants D0_FullCohort_DpGr10_Fil.vcf 1128 Remove sites with depth < 10 D0_FullCohort_DpGr10_Fil_AncCalls.vcf 1094 Remove sites with ancestor no-calls (./.) D0_FullCohort_DpGr10_Fil_AncCalls_NoHets.vcf 854 Remove sites with ancestor genotype as heterozygous D0_FullCohort_DpGr10_Fil_AncCalls_NoHets_GEM.vcf 664 Removed low-mappability sites
Checked to make sure they both gave the same thing – they did (exact same results)
Using the First method, because it seems to make more sense (remove low-mapping sites first instead of at the end)
4/1/20
Chromosome I depth distribution (x axis is location on chr I, y axis is depth)
Chromosome II:
Chromosome III:
Chromosome IV:
Chromosome V:
Chromosome VI:
Chromosome VII:
Chromosome VIII:
Chromosome IX:
Chromosome X:
Chromosome XI:
Chromosome XII:
Chromosome XIII:
Chromosome XIV:
Chromosome XV:
Chromosome XVI:
Filtering of Variants D0 Low depth: 82 High depth: 206
File Length D0_FullCohort.vcf 1595 D0_noLow.vcf 962 D0_noLow_noHigh.vcf 772 D0_noLow_noHigh_redGEM.vcf 580 D0_noLow_noHigh_redGEM_AncCalls.vcf 573 D0_noLow_noHigh_redGEM_AncCalls_NoHets.vcf 471
D1 Low depth: 70 High depth: 188
File Length D1_FullCohort.vcf 2695 D1_noLow.vcf 1586 D1_noLow_noHigh.vcf 1346 D1_noLow_noHigh_redGEM.vcf 971 D1_noLow_noHigh_redGEM_AncCalls.vcf 958 D1_noLow_noHigh_redGEM_AncCalls_NoHets.vcf 812
D20 Low depth: 82 High depth: 206
File Length D20_FullCohort.vcf 1835 D20_noLow.vcf 1251 D20_noLow_noHigh.vcf 963 D20_noLow_noHigh_redGEM.vcf 741 D20_noLow_noHigh_redGEM_AncCalls.vcf 721 D20_noLow_noHigh_redGEM_AncCalls_NoHets.vcf 526
Found variants in the ancestors:
D20 ancestor variants
Chr Pos GT Type VII 61782 1/1 ALT – indel XVI 186933 1/1 ALT – indel
D1 ancestor variants
Chr Pos GT Type IV 641630 2/2 ALT – indel (het in D20 & majority of progeny) VII 61782 1/1 ALT – indel VII 986079 1/1 ALT – SNP (not in D20) XV 299062 1/1 ALT – SNP (not in D20) XVI 186933 2/2 ALT – SNP
D0 ancestor variants
Chr Pos GT Type IV 641630 2/2 ALT – indel VII 61782 1/1 ALT – indel XIII 111478 ½ ALT – indel XV 314234 ½ ALT – indel
4/3/20 Heterozygous sites shared between the 3 diploid ancestors: 40 total File: /Users/hollymcqueary/Dropbox/McQueary/Paradoxus_MA/VCFs/AncHets/D0_D1_D20_common.txt
4/7/20 H0 median depth = 153
Filtering based on depth of 150 though, so removing anything less than 90 and greater than 218
H0 Final Cohort Variant Info:
hcm14449@n204 H0$ wc -l H0_FullCohort.vcf 1729 H0_FullCohort.vcf hcm14449@n204 H0$ pwd /scratch/hcm14449/TE_MA_Paradoxus/Illumina_Data/Out/H0
1729 raw variants
Remove sites with depth less than 90:
hcm14449@n204 H0$ wc -l H0_noLow.vcf 1080 H0_noLow.vcf
Remove sites with depth greater than 218
hcm14449@n204 H0$ wc -l H0_noLow_noHigh.vcf 889 H0_noLow_noHigh.vcf
Remove hard-to-map regions called by GEM:
hcm14449@n204 H0$ wc -l H0_noLow_noHigh_redGem.vcf 668 H0_noLow_noHigh_redGem.vcf
Removed sites with no calls in the ancestor:
hcm14449@n204 H0$ wc -l H0_noLow_noHigh_redGem_AncCalls.vcf 654 H0_noLow_noHigh_redGem_AncCalls.vcf
Removed sites with hets in the ancestor
hcm14449@n204 H0$ wc -l H0_noLow_noHigh_redGem_AncCalls_NoHets.vcf 545 H0_noLow_noHigh_redGem_AncCalls_NoHets.vcf
SNPs:
Indels:
hcm14449@n204 H0$ wc -l H0_noLow_noHigh_redGem_AncCalls_NoHets_Indels.vcf 223 H0_noLow_noHigh_redGem_AncCalls_NoHets_Indels.vcf
_Originally posted by @hollygene in https://github.com/hollygene/TE_MA/issues/2#issuecomment-612691832_