Open hollygene opened 4 years ago
Methods - Following GATK's Best Practices https://github.com/hollygene/TE_MA/blob/TE_MA/H0_script.sh
Create unmapped bam from paired-end fastqs using picard
mark illumina adapters
convert unmapped bams back to fastq
align to reference using bwa mem
merge bams using MergeBamAlignment
sort bam output
index bam output
Mark and remove duplicates using picard
call mutations using HaplotypeCaller in gVCF mode
combine 8 random samples together and jointly genotype
use resulting .vcf file to recalibrate quality scores
re-call variants using haplotype caller with recalibrated samples
combine gVCFs from each sample
jointly genotype gVCFs
filter final VCF
Dave modified the calls in excel using the following codes:
pull out ancestor calls -MID(M2,SEARCH(":",M2)+1,SEARCH(":",M2,SEARCH(":",M2)+1)-SEARCH(":",M2)-1)
verify number of ancestor alleles
isolate first allele depth
isolate second allele depth
get depth total
verify depth between 50 and 170?
find Ancestor allele freq
0.1<ancestor_freq<0.9
0.05<ancestor_freq<0.95
number of "./." in any line or ancestor
overall (0.1, 0.9)
overall (0.05, 0.95
heterozygous ancestor
keep?
check (het anc are being removed)(
Ancestor genotype
number of hets in MA lines
Calculate mutation rate and find spectrum of mutations for each strain in dataset.
Begin with H0