Open hollygene opened 4 years ago
H0 samples Dave worked on dataset further and found several samples with shared SNPs (highlighted in yellow in attached spreadsheet) H0_vcf_all_NEWCALLSv2.xlsx
Removed samples 15, 22, 29, 36, 37, 40, 42, 46, 48, 5
Samples that shared SNPs:
32, 37 39, 40 14, 15 28, 29 41, 42 44, 48 45, 46 4, 5 7, 22
Looked at cross-contaminants in D0 samples
Summary:
Line 21 shares SNPs with a lot of other lines - problem with de-multiplexing maybe? If I remove line 21, this is the results:
Looking at lines with 1 shared SNP, found that GQ scores were low:
Looking at ancestor calls in these sites, is GQ low? Does it look heterozygous but was called homozygous?
Only one site has a very low GQ score in the ancestor, the rest all look like confident calls.
Going to talk to Dave and decide what to filter out based on GQ scores in the ancestor: Plotted the scores in a distribution to determine what cutoffs to use:
_Originally posted by @hollygene in https://github.com/hollygene/TE_MA/issues/2#issuecomment-612959410_