slowkoni / rfmix

RFMIX - Local Ancestry and Admixture Inference Version 2
73 stars 25 forks source link

phase correction? #7

Open vacanfield opened 6 years ago

vacanfield commented 6 years ago

RFMix v1.5.4 has the option of correcting phasing errors in the admixed population. The rfmix v2 manual does not document this feature, if it exists, aside from a cryptic mention under Limitations:

The phase-correction features of RFMIX, if enabled ...

Any suggestions?

mocksu commented 4 years ago

HI Vacanfield,

Any update on this? I am fighting with a phasing error as well.

I am trying to use rfmix2 to replicate Alicia Martin's HG02481, but the European segments have obviously too many switch errors as suggested by the color of the segments. The problem is that I can replicate the results of HG02481 with rfmix1. I use "bcftools convert --hapsample2vcf " to convert the .haps/.sample files to .vcf.gz file from the rfmix1 process, assuming phasing should not be a problem, bot nonetheless the switch errors look obvious.

Thanks for any help!

pmonnahan commented 4 years ago

I am also wondering about this. I have a pretty simple scenario that I would expect RFmix to easily sort out, but there are numerous regions where average ancestry flips based on expectation. I looked through the source code as well as Manual, but don't see any mention of phase correction. Perhaps this feature has yet to be implemented?

mocksu commented 4 years ago

I am also wondering about this. I have a pretty simple scenario that I would expect RFmix to easily sort out, but there are numerous regions where average ancestry flips based on expectation. I looked through the source code as well as Manual, but don't see any mention of phase correction. Perhaps this feature has yet to be implemented?

Hi pmonnahan, did you try Alicia Martin's pipeline using rfmix1? Her pipeline seems to correct the flips with script "shapeit2rfmix.py". But rfmix1 is very hard to work with and is 2 orders of magnitude slower. "shapeit2rfmix.py" does not take vcf input, and does not output in vcf nor .haps/.sample, so it cannot be easily used for rfmix2.

pmonnahan commented 4 years ago

Hi mocksu,

I've used some of the utilities in that pipeline, but I have not actually used the part that implements rfmix1. I briefly looked over the documentation for rfmix1, but was turned off due to the significant differences in output/input, particularly after investing a good amount of time writing scripts for rfmix2. I am trying to see if I can do a better job at phasing prior to running rfmix, in hopes that this, at least partly, alleviates the problem. I appreciate the suggestion however, and will consider rfmix1 if my current route doesn't bear fruit.

mocksu commented 4 years ago

Hi mocksu, I've used some of the utilities in that pipeline, but I have not actually used the part that implements rfmix1. I briefly looked over the documentation for rfmix1, but was turned off due to the significant differences in output/input, particularly after investing a good amount of time writing scripts for rfmix2. I am trying to see if I can do a better job at phasing prior to running rfmix, in hopes that this, at least partly, alleviates the problem. I appreciate the suggestion however, and will consider rfmix1 if my current route doesn't bear fruit.

Hi Pmonnahan,

Just want to tell you something positive about rfmix2.

  1. We used eagle cohort based phasing (because we have a large sample size -- half a million samples in our cohort) and the ancestry painting results do not have the choppy "switch errors".

  2. We used family trios in our cohort and confirmed the rfmix2 ancestry painting results look alright.

The above being said, it's very hard to think why rfmix2 painting with shapeit2 reference based phasing result (sample HG02481 in Alicia's example) has so many "switch errors". If it is because of phasing error, shapeit2 is the "gold standard" in phasing; and even if we use the phasing result directly from G1k for HG02481, the problem is still there. So it does not seem it's a phasing issue. But if it's not a phasing issue, what else could it be? This is quite puzzling.

pmonnahan commented 4 years ago

Hi mocksu,

Thanks again for this information. My attempts at improving phasing seem to have done the trick with regard to resolving the multiple switch-errors. I am using Shapeit4, and the trick was to include a reference phased VCF during phasing. Previously, I was just running Shapeit4 on the query data with no external reference. I decided to use the same reference VCF for Shapeit as I was using for RFmix, and this seems to have produced mostly sensible results.

mocksu commented 4 years ago

Hi Patrick,

This is good news!

A few quick questions:

Thanks so much,

Mousheng Xu

On Thu, Jan 23, 2020 at 12:29 PM Patrick Monnahan notifications@github.com wrote:

Hi mocksu,

Thanks again for this information. My attempts at improving phasing seem to have done the trick with regard to resolving the multiple switch-errors. I am using Shapeit4, and the trick was to include a reference phased VCF during phasing. Previously, I was just running Shapeit4 on the query data with no external reference. I decided to use the same reference VCF for Shapeit as I was using for RFmix, and this seems to have produced mostly sensible results.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/slowkoni/rfmix/issues/7?email_source=notifications&email_token=ABQHVK456X6DLSJ4PPYMN23Q7HHX3A5CNFSM4FWJTLXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJYE7GA#issuecomment-577785752, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQHVK4L35RD6VUVE56FK5LQ7HHX3ANCNFSM4FWJTLXA .

pmonnahan commented 4 years ago

Hi Mousheng,

Ahh, I had not realized that her scripts did reference based phasing. What I've done is essentially the same then, I believe. I have a reference VCF (taken from 1000Genomes), which was already phased. I then used this phased reference VCF when phasing my query data. RFMix2 will take the phased BCF as input, so no conversion is necessary for me after running shapeit. We do remove rare and duplicated SNPs as well as ones that violate HWE expectation.

Best, Patrick

guidebortoli commented 3 years ago

Hi,

Any progress on this matter? I'm also looking for the switch-error corrections...but I could not find anything in the source code that mention this option...

Thanks

mocksu commented 3 years ago

Hi Guilherme,

As I said, we used eagle cohort phasing and it worked great. We did follow Alicia's online instructions to build the reference panel, though. A quite complicated process IMHO.

Best,

Mousheng Xu

On Wed, May 5, 2021 at 11:59 AM Guilherme Debortoli < @.***> wrote:

Hi,

Any progress on this matter? I'm also trying for the switch-error corrections...

Thanks

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/slowkoni/rfmix/issues/7#issuecomment-832811282, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQHVK7JOS56GXH6GK7PKJDTMFTMRANCNFSM4FWJTLXA .

kscott-1 commented 6 months ago

This issue is a bit ancient now, but if anyone happens to be looking for this answer now, RFMix2 does not implement any form of phase correction. I spoke with the developer and he confirmed this.