The QC part of the pipeline is similar to the UKBB QC process
A brief overview of the steps:
LD pruning to remove uncorrelated variants (via SNPRelate)
Kinship estimation robust to pop. structure (via SNPRelate implementation of KING-robust)
Estimates above provided to PC-AiR for PCA robust to relatedness, but not pop. structure (via GENESIS)
It is interesting that PLINK offers "Relatedness Pruning" as an explicit step that PC-AiR embeds within the preprocessing steps (i.e. it runs PCA on a subset of unrelated samples)
PCA robust to population structure using PC-Relate (via GENESIS)
This requires the PCA vectors from the PC-AiR step
The scaling strategy for this involves operating first on groups of samples (+ all variants) and then operating on pairs of results from sample blocks. That's a strategy I had thought about before but haven't seen implemented anywhere until this
Association testing using kinship estimates as random effects and PCs as fixed effects (via GENESIS)
Stephanie Gogarten of UW has a few repos worth perusing, though she uses R and not Python: https://github.com/UW-GAC/analysis_pipeline shares similar goals to this repository but for TOPMed data; https://github.com/UW-GAC/GENESIS and https://github.com/smgogarten/GWASTools have code to support the TOPMed pipeline.