Perhaps this is a philosophical rather than a technical question.
I would like to use OutFLANK to detect possible outliers in several population pairs, each of which has up to 20% missing data (70k - 200k SNPs per population pair). OutFLANK handles missing data just fine, but the entire basis of the program relies on trimming SNPs, which is suggested to be performed in bigsnpr. However, bigsnpr does not accept missing data (instead recommending imputation of genotypes).
Do you have any reason to believe imputing data might reduce the power to detect outliers because it should tend to homogenize loci? Or, conversely, could imputation lead to false positives at loci that did not have missing data, if many other sites were imputed?
Alternatively, do you know of other R packages that could trim SNPs that do allow missing data?
Perhaps this is a philosophical rather than a technical question.
I would like to use OutFLANK to detect possible outliers in several population pairs, each of which has up to 20% missing data (70k - 200k SNPs per population pair). OutFLANK handles missing data just fine, but the entire basis of the program relies on trimming SNPs, which is suggested to be performed in bigsnpr. However, bigsnpr does not accept missing data (instead recommending imputation of genotypes).
Do you have any reason to believe imputing data might reduce the power to detect outliers because it should tend to homogenize loci? Or, conversely, could imputation lead to false positives at loci that did not have missing data, if many other sites were imputed?
Alternatively, do you know of other R packages that could trim SNPs that do allow missing data?
Thank you for your thoughts, Loren