Open SelinaKlees opened 11 months ago
Hi,
The high coverage genotypes (the genfile) is not used for imputation, it is only used to provide a measure of accuracy during imputation. The reason to include it during imputation is to get an estimate of the accuracy at each iteration, to guide parameter choices, e.g. the number of iterations.
The other plot you mention should be a plot of of the "real" allele frequency (from the pileup) vs the estimated allele frequency (from the imputation). It makes the most sense for SNPs that pass QC.
Nonetheless if you want to plot all SNPs you can take a look at the function
plotEstimatedAgainstReal(
outputdir = outputdir,alleleCount=alleleCount,
estimatedAlleleFrequency=estimatedAlleleFrequency,
which=passQC,chr=chr,regionName=regionName
)
and the code for this function here https://github.com/rwdavies/STITCH/blob/ecba5f771a86313c74465a0c54fefbf96688fceb/STITCH/R/functions.R#L4994
you should be able to load the results (RData/EM.a.ll.* as appropriate) and by sourcing the STITCH/R/functions.R file then you can call something like
plotEstimatedAgainstReal(
outputdir = outputdir,alleleCount=alleleCount,
estimatedAlleleFrequency=estimatedAlleleFrequency,
which=rep(TRUE, length(estimatedAlleleFrequency),chr=chr,regionName=regionName
)
and it should work (though would override your original file)
Hope that helps, Robbie
Dear Robert,
I have a question regarding the genfile. Actually, it is not totally clear to me if the genotypes of high coverage seq data (i.e. the genfile) is actually used for the impuation itself (as some kind of reference panel) or if it is just used to obtain a quality coefficient (r^2). Further, in my results plots, I found the r^2 plot and the value but just for the "goodonly", as it is called. How can I obtain the r^2 value for all SNPs?
Thank you for your help, Selina