Targeting the preprocessing step where we prepare the background dataset. A common challenge in genetic studies, especially those involving PCA, is the influence of related individuals on the analysis. Their presence can skew results, leading to inaccurate interpretations. To address this, we've implemented a feature that allows for the explicit removal of related individuals from the background dataset before conducting PCA (in addition to the option to remove related individuals from the dataset in question).
Key Changes:
Related Individuals Removal:
Leveraging a precomputed table of related individuals relateds_to_drop_ht we now filter out these samples from background dataset. This ensures that our PCA analysis is conducted on a dataset free of relatedness biases.
Configurable Removal List: The list of individuals to remove is configurable, depending on the background datasets used. The background datasets MUST HAVE BEEN RUN THROUGH THE Relatedness STAGE.
This looks great. The only thing I would suggest is naming relateds_to_drop to background_relateds_to_drop to differentiate between the background related samples/ht and the dataset relateds
Description:
Targeting the preprocessing step where we prepare the background dataset. A common challenge in genetic studies, especially those involving PCA, is the influence of related individuals on the analysis. Their presence can skew results, leading to inaccurate interpretations. To address this, we've implemented a feature that allows for the explicit removal of related individuals from the background dataset before conducting PCA (in addition to the option to remove related individuals from the dataset in question).
Key Changes:
Related Individuals Removal: Leveraging a precomputed table of related individuals
relateds_to_drop_ht
we now filter out these samples from background dataset. This ensures that our PCA analysis is conducted on a dataset free of relatedness biases. Configurable Removal List: The list of individuals to remove is configurable, depending on the background datasets used. The background datasets MUST HAVE BEEN RUN THROUGH THERelatedness
STAGE.