Add Ability to Remove Related Individuals from Background Dataset for PCA Analysis

Description:

Targeting the preprocessing step where we prepare the background dataset. A common challenge in genetic studies, especially those involving PCA, is the influence of related individuals on the analysis. Their presence can skew results, leading to inaccurate interpretations. To address this, we've implemented a feature that allows for the explicit removal of related individuals from the background dataset before conducting PCA (in addition to the option to remove related individuals from the dataset in question).

Key Changes:

Related Individuals Removal: Leveraging a precomputed table of related individuals relateds_to_drop_ht we now filter out these samples from background dataset. This ensures that our PCA analysis is conducted on a dataset free of relatedness biases. Configurable Removal List: The list of individuals to remove is configurable, depending on the background datasets used. The background datasets MUST HAVE BEEN RUN THROUGH THE Relatedness STAGE.

populationgenomics / production-pipelines

Add Ability to Remove Related Individuals from Background Dataset for PCA Analysis #807