odelaneau / shapeit5

Segmented HAPlotype Estimation and Imputation Tool
https://odelaneau.github.io/shapeit5/
MIT License
62 stars 9 forks source link

ligate won't treat offspring as scaffolded unless parents are still in data #108

Open kkellysci opened 3 weeks ago

kkellysci commented 3 weeks ago

I am phasing a large (n=172k) sample of parents and offspring (some duos, some trios), but I only need the phased genotypes for the offspring.

I start by running the phasing jobs (with the chromosomes in chunks) using shapeit5 with --pedigree on a HPC cluster. The initial whole-sample output files are written to a local scratch filesystem, where they will be deleted immediately after the job finishes. I then use bcftools view -S to subset these results to just the offspring, and save that smaller results chunk on the cluster's network filesystem where files will persist past the end of the job.

After all jobs have finished running, I try to use ligate with the --pedigree flag on the offspring-only results chunks. Despite using --pedigree, it detects the offspring samples as non-scaffolded, haplotype order gets swapped, and sometimes chunks from the maternal and paternal haplotypes are incorrectly combined as if they were in phase.

Is the behavior of ligate for a file where 100% of the samples are scaffolded just the same as bcftools concat -a -d all, or would there still be a reason to prefer ligate? If there's still a reason to prefer ligate, then is there a way to get it to treat offspring as scaffolded (eg. refrain from swapping haplotypes around) even when parents are no longer in the data?