hollygene / TE_MA

S. paradoxus TE MA experiment
0 stars 0 forks source link

Final Cohort Filtering #5

Closed hollygene closed 4 years ago

hollygene commented 4 years ago

H0 Final Cohort Variant Info: hcm14449@n204 H0$ wc -l H0_FullCohort.vcf 1729 H0_FullCohort.vcf hcm14449@n204 H0$ pwd /scratch/hcm14449/TE_MA_Paradoxus/Illumina_Data/Out/H0

1729 raw variants

Remove sites with depth less than 90: hcm14449@n204 H0$ wc -l H0_noLow.vcf 1080 H0_noLow.vcf

Remove sites with depth greater than 218 hcm14449@n204 H0$ wc -l H0_noLow_noHigh.vcf 889 H0_noLow_noHigh.vcf

Remove hard-to-map regions called by GEM: hcm14449@n204 H0$ wc -l H0_noLow_noHigh_redGem.vcf 668 H0_noLow_noHigh_redGem.vcf

Removed sites with no calls in the ancestor: hcm14449@n204 H0$ wc -l H0_noLow_noHigh_redGem_AncCalls.vcf 654 H0_noLow_noHigh_redGem_AncCalls.vcf

Removed sites with hets in the ancestor hcm14449@n204 H0$ wc -l H0_noLow_noHigh_redGem_AncCalls_NoHets.vcf 545 H0_noLow_noHigh_redGem_AncCalls_NoHets.vcf

SNPs:

hcm14449@n204 H0$ wc -l H0_noLow_noHigh_redGem_AncCalls_NoHets_SNPs.vcf
344 H0_noLow_noHigh_redGem_AncCalls_NoHets_SNPs.vcf

Indels: hcm14449@n204 H0$ wc -l H0_noLow_noHigh_redGem_AncCalls_NoHets_Indels.vcf 223 H0_noLow_noHigh_redGem_AncCalls_NoHets_Indels.vcf

_Originally posted by @hollygene in https://github.com/hollygene/TE_MA/issues/2#issuecomment-612691832_

hollygene commented 4 years ago

Updates from 3/19 to present: (Had this in a Word doc, but will update on here from now on)

3/19/20

3/26/20

3/27/20

File Length D0_FullCohort.vcf 1595 Raw variants D0_reducedGEM.vcf 1042 Removed low-mappabiltiy sites D0_reducedGEM_DpGr10_Fil.vcf 827 Removed sites with depth <10 D0_reducedGEM_DpGr10_Fil_AncCalls.vcf 811 Removed sites with ancestor no-calls (./.) D0_reducedGEM_DpGr10_Fil_AncCalls_NoHets.vcf 664 Removed sites with ancestor genotype as Heterozygous

Alternative method: D0_FullCohort.vcf 1595 Raw variants D0_FullCohort_DpGr10_Fil.vcf 1128 Remove sites with depth < 10 D0_FullCohort_DpGr10_Fil_AncCalls.vcf 1094 Remove sites with ancestor no-calls (./.) D0_FullCohort_DpGr10_Fil_AncCalls_NoHets.vcf 854 Remove sites with ancestor genotype as heterozygous D0_FullCohort_DpGr10_Fil_AncCalls_NoHets_GEM.vcf 664 Removed low-mappability sites

Checked to make sure they both gave the same thing – they did (exact same results)

Using the First method, because it seems to make more sense (remove low-mapping sites first instead of at the end)

4/1/20

Chromosome I depth distribution (x axis is location on chr I, y axis is depth) Screen Shot 2020-04-01 at 1 45 29 PM

Chromosome II: Screen Shot 2020-04-01 at 1 51 51 PM

Chromosome III: Screen Shot 2020-04-01 at 1 53 51 PM

Chromosome IV: Screen Shot 2020-04-01 at 1 57 26 PM

Chromosome V: Screen Shot 2020-04-01 at 2 03 58 PM

Chromosome VI: Screen Shot 2020-04-01 at 3 09 01 PM

Chromosome VII: Screen Shot 2020-04-01 at 3 13 00 PM

Chromosome VIII: Screen Shot 2020-04-01 at 3 14 14 PM

Chromosome IX: Screen Shot 2020-04-01 at 3 15 29 PM

Chromosome X: Screen Shot 2020-04-01 at 3 19 18 PM

Chromosome XI: Screen Shot 2020-04-01 at 3 21 56 PM

Chromosome XII: Screen Shot 2020-04-01 at 3 23 41 PM

Chromosome XIII: Screen Shot 2020-04-01 at 3 28 20 PM

Chromosome XIV: Screen Shot 2020-04-01 at 3 29 32 PM

Chromosome XV: Screen Shot 2020-04-01 at 3 30 47 PM

Chromosome XVI: Screen Shot 2020-04-01 at 3 32 13 PM

Filtering of Variants D0 Low depth: 82 High depth: 206

File Length D0_FullCohort.vcf 1595 D0_noLow.vcf 962 D0_noLow_noHigh.vcf 772 D0_noLow_noHigh_redGEM.vcf 580 D0_noLow_noHigh_redGEM_AncCalls.vcf 573 D0_noLow_noHigh_redGEM_AncCalls_NoHets.vcf 471

D1 Low depth: 70 High depth: 188

File Length D1_FullCohort.vcf 2695 D1_noLow.vcf 1586 D1_noLow_noHigh.vcf 1346 D1_noLow_noHigh_redGEM.vcf 971 D1_noLow_noHigh_redGEM_AncCalls.vcf 958 D1_noLow_noHigh_redGEM_AncCalls_NoHets.vcf 812

D20 Low depth: 82 High depth: 206

File Length D20_FullCohort.vcf 1835 D20_noLow.vcf 1251 D20_noLow_noHigh.vcf 963 D20_noLow_noHigh_redGEM.vcf 741 D20_noLow_noHigh_redGEM_AncCalls.vcf 721 D20_noLow_noHigh_redGEM_AncCalls_NoHets.vcf 526

Found variants in the ancestors:

D20 ancestor variants

Chr Pos GT Type VII 61782 1/1 ALT – indel XVI 186933 1/1 ALT – indel

D1 ancestor variants

Chr Pos GT Type IV 641630 2/2 ALT – indel (het in D20 & majority of progeny) VII 61782 1/1 ALT – indel VII 986079 1/1 ALT – SNP (not in D20) XV 299062 1/1 ALT – SNP (not in D20) XVI 186933 2/2 ALT – SNP

D0 ancestor variants

Chr Pos GT Type IV 641630 2/2 ALT – indel VII 61782 1/1 ALT – indel XIII 111478 ½ ALT – indel XV 314234 ½ ALT – indel

4/3/20 Heterozygous sites shared between the 3 diploid ancestors: 40 total File: /Users/hollymcqueary/Dropbox/McQueary/Paradoxus_MA/VCFs/AncHets/D0_D1_D20_common.txt

4/7/20 H0 median depth = 153

Screen Shot 2020-04-07 at 5 51 19 PM

Filtering based on depth of 150 though, so removing anything less than 90 and greater than 218