Closed davidecarlson closed 4 years ago
Hi Dave,
Does the above seem sensible?
Yes, I think the criteria you described seems sensible. You could also try comparing different confidence thresholds, for example not allowing any presence reads for a TE being called as absent in one individual in a "high confidence" set of calls, and then having more lax requirements for intermediate or lower confidence sets. This might help you convince yourself of any potential patterns you find.
Does an approach like this seem reasonable to you?
Again, I think this is a reasonable thing to do. I would hesitate to include samples with a -9
in column 13 for anything that you want to have high confidence in, but you could also simply sum up the presence and absence counts across all samples and take the fraction of presence over total counts. I imagine this would give you a reasonable approximation to actually having good genotypes called on the individuals, but I've never actually tested this.
Cheers, Jeff
Hi Jeff, Thanks for the advice. It's very helpful. I'll go ahead and close this. Best, Dave
Hi Jeffrey,
Thanks for making TEFLoN available. I've run the pipeline on a group of 12 individuals, and now I'm working on parsing and understanding the results. I'm trying to follow a set of principles that are relatively similar to what you and co-authors did in the 2017 GBE paper.
To call a "present" genotype for a TE insertion, I'm currently requiring:
To call an "absent" genotype for a TE insertion, I'm requiring:
- 3 or more "absence reads (column 11) in that sample
Does the above seem sensible?
Also, ideally I would like to calculate the frequencies of particular TE insertion alleles across all my 12 samples, but for most TE insertions, I tend to have an ambiguous genotype call (column 13 = -9) in one or more samples, which obviously makes things calculating the frequency more complicated.
My current thought is to set a threshold for minimum # of samples with an unambiguous genotype call, and then estimate the allele frequency for each TE insertion using the samples that have a called genotype for that TE insertion.
Does an approach like this seem reasonable to you? Thanks for an advice! Dave