Closed alimanfoo closed 2 months ago
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
Looks great but failed test_pca_fit_exclude_samples[ag3_sim]
Looks great but failed
test_pca_fit_exclude_samples[ag3_sim]
Thanks Lee. That failure is tricky, it's one of those ones that only pops up sometimes, and I don't fully understand why. I've just pushed a commit which tries to work around, will see how it runs.
This PR adds two new parameters to the Anopheles
pca()
function to help with situations where you have PCA outliers that need to be excluded:exclude_samples
- This parameter can be a list of samples that will be excluded completely from the PCA analysis.fit_exclude_samples
- This parameter can be a list of samples that will be excluded during the fitting stage of the PCA analysis, but will be included in the projection stage and therefore in the resulting output dataframe.The
fit_exclude_samples
parameter is particularly useful where you have samples that are outliers but you still want to see where they fall within real geographical or taxonomic structure.Note that both of these parameters are only applied after the loading of the input data (biallelic diplotypes) has been computed. This input data will be cached if a
results_cache
has been set, meaning that making changes to either of these parameters then rerunning the function should be relatively quick.Resolves #389.