odelaneau / shapeit5

Segmented HAPlotype Estimation and Imputation Tool
https://odelaneau.github.io/shapeit5/
MIT License
62 stars 9 forks source link

Phase set information #3

Closed JosephLalli closed 1 year ago

JosephLalli commented 1 year ago

I would like to phase and impute my dataset with shapeit5. I originally planned on using shapeit4, incorporating read-supported phase sets as calculated by whatshap4. (See #13 from https://odelaneau.github.io/shapeit4/#documentation). I have a diverse WGS dataset, and being able to phase singletons/rare variants would be beneficial.

Does Shapeit5 use phase sets, like Shapeit4? If so, is there anything I need to do to ensure it uses that information?

odelaneau commented 1 year ago

Hi, this has not yet been implemented in shapeit5 as we plan to change a bit how this is done in order to scale to many thousands of individuals. The implementation in shapeit4, relying on WhatsHap, is way too slow when you have to process thousands of samples. In the meantime, you can use shapeit4+whatshap to do so.

JosephLalli commented 1 year ago

Sounds good @odelaneau. I'm working on using shapeit5 in several projects, including phasing the 1000 genomes project samples. Speed is less of a concern for these projects than for yours!

Related question: I have noticed that shapeit4 has different default settings for several common parameters. (I am away from my computer at the moment, but I can provide a list of these parameters when I get back.) For a data set of either several hundred or several thousand samples, would you recommend using shapeit4's default settings? Or even simply using shapeit4 to perform the initial common variant phasing?

-Joe