poruloh / Eagle

Haplotype phasing software
63 stars 15 forks source link

Phasing with reference panel results into segmentation fault #25

Closed guosongwang closed 5 years ago

guosongwang commented 5 years ago

Dear author,

Thanks for your fantastic work.

I am wondering if you could give any suggestions about an error I encountered when I tried to phase with reference panel.

I am working with cattle and my data had 38 founders that were genotyped at HD density (~770K SNPs) and more than 1500 animals genotyped at 50K density. My ultimate goal is to impute chromosome 1 to HD resolution. I am following a workflow using Eagle and Minimac for such purpose.

I phased the 38 founders without reference panel to make it as a reference panel for the 1500 animals who are their offsprings. I made bcf files for the founders and unphased bcf file for the 1500 animals and I called:

eagle --geneticMapFile=genetic_map_1cMperMb.txt --noImpMissing --chrom=1 --vcfRef=Founders_HD_test_chr1.phase.bcf --vcfTarget=McG_Weatherbys_chr1.bcf --outPrefix=McG_Weatherbys_test_chr1.phase

The program seems to be running at beginning, but ended up with a Segmentation fault (core dump) without telling me more details, the output looks like this:

Setting number of threads to 1

Reference samples: Nref = 38 Target samples: Ntarget = 1511 Segmentation fault (core dumped)

Can you give me any suggestions about what went wrong?

Thanks for your time! Truly appreciate in advance!

Thanks, Guosong Wang

poruloh commented 5 years ago

Hi Guosong,

Could you please send the full output printed before the segmentation fault?

Po-Ru

guosongwang commented 5 years ago

Hi Po-Ru,

Thanks for your help again! I have attached the script I used, including some preparation I made from Plink format to the input of eagle, hope you might spot some mistakes I didn't realize, and also the full output I got.

Thanks again for your precious time!

Eagle_gw.zip Guosong

poruloh commented 5 years ago

Hi Guosong,

Apologies for the long delay. Unfortunately the output doesn't tell me much. Might you be able to share the input files? I'll be happy to track down the bug if so.

Po-Ru

On Mon, Jul 15, 2019 at 2:51 PM Guosong Wang notifications@github.com wrote:

Hi Po-Ru,

Thanks for your help again! I have attached the script I used, including some preparation I made from Plink format to the input of eagle, hope you might spot some mistakes I didn't realize, and also the full output I got.

Thanks again for your precious time!

Eagle_gw.zip https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_poruloh_Eagle_files_3393747_Eagle-5Fgw.zip&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=pBlP9cuNT3BUDsRDLkVbgWxzX8j_R7x0nL_Q4UWFMPo&m=hRV0RaWvB3b5fSfgsrgziXd7gWjO1XY9VTukcWk6yvI&s=yjKmBIE6DvtDgrnRPv8YbGT9d7kBzl8RRdTGg9D0wPk&e= Guosong

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_poruloh_Eagle_issues_25-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DACDGGQNAAKWJVPIFVQJFGOTP7TBKJA5CNFSM4IB7P7WKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ6T7ZA-23issuecomment-2D511524836&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=pBlP9cuNT3BUDsRDLkVbgWxzX8j_R7x0nL_Q4UWFMPo&m=hRV0RaWvB3b5fSfgsrgziXd7gWjO1XY9VTukcWk6yvI&s=CKgB1Hsn879HnqHFwVhkAJJdpyA1M2A0uZj5SCr12CA&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ACDGGQN2YPMP43XYVNRHTH3P7TBKJANCNFSM4IB7P7WA&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=pBlP9cuNT3BUDsRDLkVbgWxzX8j_R7x0nL_Q4UWFMPo&m=hRV0RaWvB3b5fSfgsrgziXd7gWjO1XY9VTukcWk6yvI&s=fZ7U_YeR151UGleir8M6dApI5kHpL-tnlhF6j0VGZII&e= .

guosongwang commented 5 years ago

Hi Po-Ru,

Thanks for offering the help. I did eventually resolve the problem after I switched to use bgzip instead of the bcftools to convert the VCF file to BCF file. Maybe you can put this into the manual as a suggestion.

Again, I appreciate your time on this.

Guosong

rborder commented 5 years ago

I can confirm that this appears to be an issue with index files created by bcftools specifically. Reindexing a bcf.gz file (even one created by bcftools) using tabix (rather than bcftools index -t) cures the segfault problem. It seems this is most likely a bug with bcftools, but it could be a problem with eagle.

pd3 commented 5 years ago

@guosongwang The Eagle_test.sh script suggests you tried to use bcftools index --tbi to index a BCF file. This cannot work, only .csi indexes are supported.

What version of bcftools are you using by the way? Recent versions refuse to do that.

rborder commented 5 years ago

@pd3 bcftools 1.8, which is the second most recent release, supports both csi and tbi and there's nothing in the changelog to suggest that the index command was changed in the most recent version. So I think you're mistaken, but I could be misunderstanding

guosongwang commented 5 years ago

I am using bcftools-1.9. It's weird to see that after I switched to bgzip to convert the vcf file, I kept using bcftools to index it and it worked fine right now. My commands go like this: bgzip -c /data/guosong.wang/Founders_HD_chr1.phase.vcf > /data/guosong.wang/Founders_HD_chr1.phase.bcf bcftools index --tbi -f /data/guosong.wang/Fouders_HD_chr1.phase.bcf

And I have done it the same way for the target vcf file needs to be phased.

rborder commented 5 years ago

@guosongwang that is strange, I suppose I was wrong about it being the index function specifically. So for both of us, we had problems when using bcftools to both convert to bcf and index, but no problems when using a different program to do one of those two tasks... I think it's safe to assume this probably isn't a bug in eagle, though what's going on exactly is a mystery

pd3 commented 5 years ago

Although bcftools 1.8 may not complain about indexing a BCF using a TBI, it does not mean it's supported. For BCF you have to use CSI, for bgzipped VCF you can use TBI or CSI.

I am sure I am not mistaken, I am the main bcftools developer :-)

guosongwang commented 5 years ago

@rborder I agree. I don't recall Eagle has specifically recommended which software to use in the manual, but since it is recommending converting it to BCF for reference-based phasing, most people will think of going to bcftools immediately. I tried bgzip only because when I tried to phase it in VCF, the program notified me that my files aren't bgziped. So at least before we get to the reason why it is causing the problem, it is safe to suggest this possible situation. Maybe this is a problem @pd3 suggested, I did try to index BCF with TBI. I can have a try and post my update later.

pd3 commented 5 years ago

@guosongwang your command is incorrect. The difference between VCF and BCF is not the bgzip compression, but that the former is text and the latter a binary represenation. You can have plain text (VCF), compressed text (VCF.gz), uncompressed binary and compressed binary (BCF). If you want to convert between these, use bcftools view -O, for example as

bcftools view -Ob -o out.bcf in.vcf          # output is compressed BCF
bcftools view -Oz -o out.vcf.gz in.bcf      # compressed VCF
bcftools view -Ov -o out.vcf in.bcf          # uncompressed VCF
bcftools view -Ou -o out.vcf in.bcf          # uncompressed BCF

The documentation can be found here: short tutorials http://samtools.github.io/bcftools/howtos/index.html and the man page http://samtools.github.io/bcftools/bcftools.html

EDIT: above I see you are working with bgzip compressed VCF (identical to bcftools view -Oz) and name it BCF, that's why the TBI indexing works.

rborder commented 5 years ago

@pd3 I ended up with the same problem as @guosongwang when converting an input vcf.gz file to compressed bcf using precisely bcftools view -Ob, followed by bcftools index -t applied to the output. using tabix to reindex resolved the seg fault problem. So where the problem lies is a total mystery to me!

rborder commented 5 years ago

Okay, I think that the problem is that bcftools index is willing to produce a tbi index for bcf file. So it's a user error on our parts, but would perhaps be easier to diagnose if bcftools produced a warning when generating an unusable index? @pd3 let me know if I'm still missing something :)

guosongwang commented 5 years ago

@pd3 Thanks for pointing it out. I gotta admit I was confused by the compressed/uncompressed vcf/bcf file format ;).

pd3 commented 5 years ago

@rborder The latest version doesn't let you index BCF with TBI

bcftools index --tbi file.bcf
[E::bcf_index_build3] TBI indices for BCF files are not supported