samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
681 stars 240 forks source link

issue with vcf2sex when dealing with Trip VCF single VCF #390

Closed jayaramanp closed 6 years ago

jayaramanp commented 8 years ago

Hello, we've been using the vxf2sex plugin from the 1.3 version of bcftools successfully for single sample VCF files thus far. We recently started using this for prediction of gender for a trio VCF file and we've been getting wrong gender prediction specifically for the Males. (the vcf2sex plugin incorrectly predicts them as F)

when we use the same command we're using for the Trio gender prediction on the individual GVCF sample, it predicts them correctly.

assuming that the individual GVCF file and the individual VCF file will be the same, we did not convert the GVCF file into VCF file (although we have had pretty much 100% success rate in prediction of genders accurately when running it on different individual (single sample) VCF files.

family/trio VCF file gender prediction: CLINICALWES-10_CWES-0069-B-A-DGD-15-75 F CLINICALWES-10_CWES-0069-F-U-DGD-15-74 M CLINICALWES-10_CWES-0069-M-U-DGD-15-73 F CLINICALWES-10_CWES-0069-P-A-DGD-15-72 F

individual GVCF file prediction: CLINICALWES-10_CWES-0069-M-U-DGD-15-73 F CLINICALWES-10_CWES-0069-P-A-DGD-15-72 F CLINICALWES-10_CWES-0069-F-U-DGD-15-74 M CLINICALWES-10_CWES-0069-B-A-DGD-15-75 M

is there a difference on how it calculates gender for TRIO/Family VS single sample VCF/GVCF files?

mcshane commented 8 years ago

@pd3 has changed tack with this plugin and has a new plugin in his fork that should supersede the vcf2sex plugin. Can you try using this new guess-ploidy plugin and see if it work better for what you need?

https://github.com/pd3/bcftools/blob/develop/plugins/guess-ploidy.c

jayaramanp commented 8 years ago

how would i run it? would i still use bcftools plugin vcf2sex ? or bcftools plugin guess-ploidy?

pd3 commented 8 years ago

Assuming the VCF has FORMAT/GT (as opposed to FORMAT/PL, genotype likelihoods), run as

bcftools +guess-ploidy file.bcf -v -t GT
jayaramanp commented 8 years ago

having issues with make. Didn't see this issue with the 'develop'/1.3 installation:

[jayaramanp@dgdrhr-01 bcftools]$ make prefix=~/bcftools_pd3_fork gcc -g -Wall -Wc++-compat -O2 -I. -Ihtslib -DPLUGINPATH=\"/home/jayaramanp/bcftools_pd3_fork/libexec/bcftools\" -c -o vcfmerge.o vcfmerge.c vcfmerge.c: In function ‘main_vcfmerge’: vcfmerge.c:2401: error: ‘regidx_parse_reg’ undeclared (first use in this function) vcfmerge.c:2401: error: (Each undeclared identifier is reported only once vcfmerge.c:2401: error: for each function it appears in.) vcfmerge.c:2402: warning: implicit declaration of function ‘regidx_insert_list’ make: *\ [vcfmerge.o] Error 1

pd3 commented 8 years ago

Ah, indeed. Can you clone my repo as described here http://pd3.github.io/bcftools? It can take some time before the regidx changes appear in the main repo.

Alternatively, you can simply copy guess-ploidy.c to bcftools-1.3, that should work.

jayaramanp commented 8 years ago

how are you setting your environment variable BCFTOOLS_PLUGIN? this wasn't there before..

[jayaramanp@dgdrhr-01 bin]$ ~/bcftools_pd3_fork/bin/bcftools +guess-ploidy /nfs/DGD/Research/dropbox/CLINICALWES-10/CWES-0006/cwes-2.0_01292016_102521/CWES-0006.vcf.gz -v -t GT

No functional bcftools plugins were found. The environment variable BCFTOOLS_PLUGINS is not set.

Could not load "guess-ploidy".

here is the folder structure: [jayaramanp@dgdrhr-01 bcftools_pd3_fork]$ ls -R .: bin include lib libexec share

./bin: bcftools bgzip htsfile plot-vcfstats tabix vcfutils.pl

./include: htslib

./include/htslib: bgzf.h faidx.h hts_defs.h kbitset.h khash.h klist.h kseq.h kstring.h sam.h tbx.h vcf_sweep.h cram.h hfile.h hts.h kfunc.h khash_str2int.h knetfile.h ksort.h regidx.h synced_bcf_reader.h vcf.h vcfutils.h

./lib: libhts.a libhts.so libhts.so.1 libhts.so.1.3-35-g26b3085-dirty pkgconfig

./lib/pkgconfig: htslib.pc

./libexec: bcftools

./libexec/bcftools: aggregate.so counts.so fill-AN-AC.so fill-tags.so frameshifts.so mendelian.so tag2tag.so validate.so color-chrs.so dosage.so fill-pop-tags.so fixploidy.so guess-ploidy.so setGT.so trio-switch-rate.so

./share: man

./share/man: man1 man5

./share/man/man1: bcftools.1 htsfile.1 tabix.1

./share/man/man5: faidx.5 sam.5 vcf.5

jayaramanp commented 8 years ago

Ive manged to get this installed.. This is what i get so far when i run it.

[jayaramanp@dgdrhr-01 cwes-2.0_01292016_102521]$ /nfs/Public/bcftools/1.3/bin/bcftools +guess-ploidy CWES-0016.vcf.gz -vt GT plugin directory /nfs/Public/bcftools/1.3/libexec/bcftools .. ok /nfs/Public/bcftools/1.3/libexec/bcftools/guess-ploidy.so: dlopen .. ok run .. ok

About: Determine sample sex by checking genotype likelihoods in non-PAR regions of sex chromosomes

Usage: bcftools +guess-ploidy [Plugin Options] Plugin options: -e, --err-prob probability of GT being wrong (with -t GT) [1e-3] -r, --regions chr:beg-end [X:2699521-154931043] -R, --regions-file regions listed in a file -t, --tag genotype or genotype likelihoods: GT, PL, GL [PL] -v, --verbose verbose output

Examples: bcftools +guess-ploidy in.vcf.gz bcftools +guess-ploidy in.vcf.gz -t GL -r chrX:2699521-154931043 bcftools view file.vcf.gz -r chrX:2699521-154931043 | bcftools +guess-ploidy bcftools +guess-ploidy in.bcf -v > ploidy.txt && guess-ploidy.py ploidy.txt img

[jayaramanp@dgdrhr-01 cwes-2.0_01292016_102521]$ /nfs/Public/bcftools/1.3/bin/bcftools +guess-ploidy CWES-0016.vcf.gz -vt GL plugin directory /nfs/Public/bcftools/1.3/libexec/bcftools .. ok /nfs/Public/bcftools/1.3/libexec/bcftools/guess-ploidy.so: dlopen .. ok run .. ok

About: Determine sample sex by checking genotype likelihoods in non-PAR regions of sex chromosomes

Usage: bcftools +guess-ploidy [Plugin Options] Plugin options: -e, --err-prob probability of GT being wrong (with -t GT) [1e-3] -r, --regions chr:beg-end [X:2699521-154931043] -R, --regions-file regions listed in a file -t, --tag genotype or genotype likelihoods: GT, PL, GL [PL] -v, --verbose verbose output

Examples: bcftools +guess-ploidy in.vcf.gz bcftools +guess-ploidy in.vcf.gz -t GL -r chrX:2699521-154931043 bcftools view file.vcf.gz -r chrX:2699521-154931043 | bcftools +guess-ploidy bcftools +guess-ploidy in.bcf -v > ploidy.txt && guess-ploidy.py ploidy.txt img

[jayaramanp@dgdrhr-01 cwes-2.0_01292016_102521]$ /nfs/Public/bcftools/1.3/bin/bcftools +guess-ploidy CWES-0016.vcf.gz -vt PL plugin directory /nfs/Public/bcftools/1.3/libexec/bcftools .. ok /nfs/Public/bcftools/1.3/libexec/bcftools/guess-ploidy.so: dlopen .. ok run .. ok

About: Determine sample sex by checking genotype likelihoods in non-PAR regions of sex chromosomes

Usage: bcftools +guess-ploidy [Plugin Options] Plugin options: -e, --err-prob probability of GT being wrong (with -t GT) [1e-3] -r, --regions chr:beg-end [X:2699521-154931043] -R, --regions-file regions listed in a file -t, --tag genotype or genotype likelihoods: GT, PL, GL [PL] -v, --verbose verbose output

Examples: bcftools +guess-ploidy in.vcf.gz bcftools +guess-ploidy in.vcf.gz -t GL -r chrX:2699521-154931043 bcftools view file.vcf.gz -r chrX:2699521-154931043 | bcftools +guess-ploidy bcftools +guess-ploidy in.bcf -v > ploidy.txt && guess-ploidy.py ploidy.txt img

jayaramanp commented 8 years ago

okay so when I ran it with -v -tag GT argument right after calling the plugin and then my input VCF file, i get really wrong results. whereas, running it the way you mentioned does not give me any results.

[jayaramanp@dgdrhr-01 cwes-2.0_01292016_102521]$ /nfs/Public/bcftools/1.3/bin/bcftools +guess-ploidy -v -tag GT CWES-0027.vcf.gz

plugin directory /nfs/Public/bcftools/1.3/libexec/bcftools .. ok /nfs/Public/bcftools/1.3/libexec/bcftools/guess-ploidy.so: dlopen .. ok run .. ok CLINICALWES-10_CWES-0027-F-U-DGD-14-95 F CLINICALWES-10_CWES-0027-M-U-DGD-14-96 F CLINICALWES-10_CWES-0027-P-A-DGD-14-94 F

pd3 commented 8 years ago

The option should be --tag or -t rather than -tag.

Otherwise it looks like a version incompatibility. Rather than trying to figure out how this happened and which version is interfering, can you try this:

 git clone --branch=develop --recursive git://github.com/pd3/bcftools.git bcftools-pd3
 cd bcftools-pd3
 make
 export BCFTOOLS_PLUGINS=./plugins/
 ./bcftools +guess-ploidy test.vcf -v -t GT
 [correct output]
jayaramanp commented 8 years ago

so i removed all previous installations.. and installed it back in my home directory i get a Segmentation Fault

[jayaramanp@dgdrhr-01 bcftools]$ ~/bcftools_pd3_fork/bin/bcftools +guess-ploidy /nfs/DGD/Research/dropbox/CLINICALWES-10/CWES-0027/cwes-2.0_01292016_102521/CWES-0027.vcf.gz -v -t GT

This file was produced by: bcftools +guess-ploidy(1.2-249-g253a131+htslib-b611659)

The command line was: bcftools +guess-ploidy -v -t GT /nfs/DGD/Research/dropbox/CLINICALWES-10/CWES-0027/cwes-2.0_01292016_102521/CWES-0027.vcf.gz

[1]SEX [2]Sample [3]Predicted sex [4]log P(Haploid)/nSites [5]log P(Diploid)/nSites [6]nSites [7]Score: F < 0 < M ($4-$5)

Segmentation fault (core dumped)

kulvait commented 8 years ago

I also have issues with my mouse samples mapped to GRCm38.82 (http://ftp.ensembl.org/pub/release-82/gtf/mus_musculus/README). All male samples seems to be regarded as being females.

jayaramanp commented 8 years ago

I corresponded with the author on this.. are you using guess-ploidy or vcf2sex plugin?

kulvait commented 8 years ago

Yes my question is regarding guess-ploidy where I have used it with -t GT switch. Have no clue how to set up autosomal regions, ploidy info or restriction on X chr coordinates. Another thing is that it works when in vcf file are reference sequences of X chromosome denoted by X but not when chrX.

jayaramanp commented 8 years ago

bcftools +guess-ploidy -e 0.1 -v -t GT .vcf.gz

an error rate of 0.1 was recommended as the number of hets in the supposedly male sample was quite high. It could have been caused by contamination or mapping/calling artefacts. The author mentioned that, in good data the diploid/haploid likelihoods are usually pretty good. Assuming these are just errors, increasing -e helps counter those errors.

i even found out that a range of 0.1-0.3 still allows for the right guess. anything less than or greater than that number causes the wrong values or the complete opposite prediction of gender.

pd3 commented 6 years ago

The problem seems resolved, closing the issue.

toyanji commented 3 years ago

I tried with guess-ploidy with a vcf file that I had. I knew the gender of some samples apriori. My aim was to predict gender for all samples and crosscheck with already known samples to see whether indeed the prediction is true. Unfortunately all the samples were predicted as females but that was not the case. Would you please help me on how to predict gender correctly?

pd3 commented 3 years ago

@toyanji Please open a new issue giving more details, ideally with a test case. The program prints usage page and some examples, not sure how to help more.