Open vacanfield opened 6 years ago
HI Vacanfield,
Any update on this? I am fighting with a phasing error as well.
I am trying to use rfmix2 to replicate Alicia Martin's HG02481, but the European segments have obviously too many switch errors as suggested by the color of the segments. The problem is that I can replicate the results of HG02481 with rfmix1. I use "bcftools convert --hapsample2vcf " to convert the .haps/.sample files to .vcf.gz file from the rfmix1 process, assuming phasing should not be a problem, bot nonetheless the switch errors look obvious.
Thanks for any help!
I am also wondering about this. I have a pretty simple scenario that I would expect RFmix to easily sort out, but there are numerous regions where average ancestry flips based on expectation. I looked through the source code as well as Manual, but don't see any mention of phase correction. Perhaps this feature has yet to be implemented?
I am also wondering about this. I have a pretty simple scenario that I would expect RFmix to easily sort out, but there are numerous regions where average ancestry flips based on expectation. I looked through the source code as well as Manual, but don't see any mention of phase correction. Perhaps this feature has yet to be implemented?
Hi pmonnahan, did you try Alicia Martin's pipeline using rfmix1? Her pipeline seems to correct the flips with script "shapeit2rfmix.py". But rfmix1 is very hard to work with and is 2 orders of magnitude slower. "shapeit2rfmix.py" does not take vcf input, and does not output in vcf nor .haps/.sample, so it cannot be easily used for rfmix2.
Hi mocksu,
I've used some of the utilities in that pipeline, but I have not actually used the part that implements rfmix1. I briefly looked over the documentation for rfmix1, but was turned off due to the significant differences in output/input, particularly after investing a good amount of time writing scripts for rfmix2. I am trying to see if I can do a better job at phasing prior to running rfmix, in hopes that this, at least partly, alleviates the problem. I appreciate the suggestion however, and will consider rfmix1 if my current route doesn't bear fruit.
Hi mocksu, I've used some of the utilities in that pipeline, but I have not actually used the part that implements rfmix1. I briefly looked over the documentation for rfmix1, but was turned off due to the significant differences in output/input, particularly after investing a good amount of time writing scripts for rfmix2. I am trying to see if I can do a better job at phasing prior to running rfmix, in hopes that this, at least partly, alleviates the problem. I appreciate the suggestion however, and will consider rfmix1 if my current route doesn't bear fruit.
Hi Pmonnahan,
Just want to tell you something positive about rfmix2.
We used eagle cohort based phasing (because we have a large sample size -- half a million samples in our cohort) and the ancestry painting results do not have the choppy "switch errors".
We used family trios in our cohort and confirmed the rfmix2 ancestry painting results look alright.
The above being said, it's very hard to think why rfmix2 painting with shapeit2 reference based phasing result (sample HG02481 in Alicia's example) has so many "switch errors". If it is because of phasing error, shapeit2 is the "gold standard" in phasing; and even if we use the phasing result directly from G1k for HG02481, the problem is still there. So it does not seem it's a phasing issue. But if it's not a phasing issue, what else could it be? This is quite puzzling.
Hi mocksu,
Thanks again for this information. My attempts at improving phasing seem to have done the trick with regard to resolving the multiple switch-errors. I am using Shapeit4, and the trick was to include a reference phased VCF during phasing. Previously, I was just running Shapeit4 on the query data with no external reference. I decided to use the same reference VCF for Shapeit as I was using for RFmix, and this seems to have produced mostly sensible results.
Hi Patrick,
This is good news!
A few quick questions:
What do you mean by a reference phased VCF? Alicia's example uses shapeit reference based phasing. Did you do it differently? Or do you mean you phase the reference samples and the query samples together instating of separately? Alicia used her script "shapeit2rfmix.py" to merge the separately phased populations and convert them into rfmix1 format files. This is not transferable to rfmix2. So if I understand it correctly, you phased all samples (ref panel samples + query panel samples) together? This might avoid the step of "shapeit2rfmix". More details on how you did this would be very helpful.
Did you filter out rare SNPs?
Did you remove duplicated SNPs?
Did you exclude mendelian inconsistent SNPs?
Thanks so much,
Mousheng Xu
On Thu, Jan 23, 2020 at 12:29 PM Patrick Monnahan notifications@github.com wrote:
Hi mocksu,
Thanks again for this information. My attempts at improving phasing seem to have done the trick with regard to resolving the multiple switch-errors. I am using Shapeit4, and the trick was to include a reference phased VCF during phasing. Previously, I was just running Shapeit4 on the query data with no external reference. I decided to use the same reference VCF for Shapeit as I was using for RFmix, and this seems to have produced mostly sensible results.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/slowkoni/rfmix/issues/7?email_source=notifications&email_token=ABQHVK456X6DLSJ4PPYMN23Q7HHX3A5CNFSM4FWJTLXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJYE7GA#issuecomment-577785752, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQHVK4L35RD6VUVE56FK5LQ7HHX3ANCNFSM4FWJTLXA .
Hi Mousheng,
Ahh, I had not realized that her scripts did reference based phasing. What I've done is essentially the same then, I believe. I have a reference VCF (taken from 1000Genomes), which was already phased. I then used this phased reference VCF when phasing my query data. RFMix2 will take the phased BCF as input, so no conversion is necessary for me after running shapeit. We do remove rare and duplicated SNPs as well as ones that violate HWE expectation.
Best, Patrick
Hi,
Any progress on this matter? I'm also looking for the switch-error corrections...but I could not find anything in the source code that mention this option...
Thanks
Hi Guilherme,
As I said, we used eagle cohort phasing and it worked great. We did follow Alicia's online instructions to build the reference panel, though. A quite complicated process IMHO.
Best,
Mousheng Xu
On Wed, May 5, 2021 at 11:59 AM Guilherme Debortoli < @.***> wrote:
Hi,
Any progress on this matter? I'm also trying for the switch-error corrections...
Thanks
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/slowkoni/rfmix/issues/7#issuecomment-832811282, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQHVK7JOS56GXH6GK7PKJDTMFTMRANCNFSM4FWJTLXA .
This issue is a bit ancient now, but if anyone happens to be looking for this answer now, RFMix2 does not implement any form of phase correction. I spoke with the developer and he confirmed this.
RFMix v1.5.4 has the option of correcting phasing errors in the admixed population. The rfmix v2 manual does not document this feature, if it exists, aside from a cryptic mention under Limitations:
Any suggestions?