The issue: We have data which identifies sex-aneuploidies, gCNV/GATK can only tolerate XX/XY data, and we have no way to remove the data which will cause failures.
The solution? A solution?
When we update the pedigree file with inferred sex, keep a list of all samples which are not XX/XY
For each of those SGIDs identified, generate a version of the VCF (& index) with chrX & chrY removed entirely
When running joint-segmentation, if the VCF was stripped of sex chr data, use the trimmed version, otherwise use the default version
Note: because the workflow needs to be defined in advance in Hail Batch, this design means that a run will need to fail once:
run for the first time, including generating the updated pedigree and finding any sex-aneuploid samples
if the samples can't all be processed, the run will die
upon re-running, the sex-aneuploidy file will be read, new trimmed VCFs will be generated for the relevant samples, and the GCNVJointSegmentation stage will run with the trimmed VCFs as appropriate
This PR needs a new name.
The issue: We have data which identifies sex-aneuploidies, gCNV/GATK can only tolerate XX/XY data, and we have no way to remove the data which will cause failures.
The solution? A solution?
Note: because the workflow needs to be defined in advance in Hail Batch, this design means that a run will need to fail once: