Closed michael-harper closed 2 weeks ago
Successful batch run here: https://batch.hail.populationgenomics.org.au/batches/455689/jobs/2
INFO:root:Superpopulations before filtering ['Europe', 'Europe', 'Europe', 'Europe', 'Europe', 'Europe', 'Europe', 'Europe', 'Europe', 'Europe', 'Europe', 'Europe', 'Europe',...]
INFO:root:Filtering background samples by ['Africa', 'South Asia']
INFO:root:Finished filtering background, kept samples that are ['Africa', 'South Asia']
INFO:root:Superpopulations after filtering ['Africa', 'Africa', 'Africa', 'Africa', 'Africa', 'Africa', 'Africa', 'Africa', 'Africa',...]
The cramqc was an oversight by me! Working on a different branch and that snuck its way in
This PR introduces an enhancement to the ancestry stage of the large cohort pipeline. The update allows for a more granular PCA by enabling the subsetting of the background population to individuals of a specific ancestry.
Changes include:
A new configuration option under
[large_cohort.pca_background]
namedsuperpopulation_to_filter
. This option allows users to specify the superpopulation to be used for the PCA. If not specified with population name, the new parameter defaults toFalse
.Additional lines of code in
ancestry_pca.py
that filter thebackground_mt
matrix table based on the superpopulation specified in the configuration.This enhancement provides users with the flexibility to perform PCA on a specific superpopulation, allowing for more detailed and targeted analysis.