Closed jayaramanp closed 6 years ago
@pd3 has changed tack with this plugin and has a new plugin in his fork that should supersede the vcf2sex plugin. Can you try using this new guess-ploidy
plugin and see if it work better for what you need?
https://github.com/pd3/bcftools/blob/develop/plugins/guess-ploidy.c
how would i run it? would i still use bcftools plugin vcf2sex ? or bcftools plugin guess-ploidy?
Assuming the VCF has FORMAT/GT (as opposed to FORMAT/PL, genotype likelihoods), run as
bcftools +guess-ploidy file.bcf -v -t GT
having issues with make. Didn't see this issue with the 'develop'/1.3 installation:
[jayaramanp@dgdrhr-01 bcftools]$ make prefix=~/bcftools_pd3_fork gcc -g -Wall -Wc++-compat -O2 -I. -Ihtslib -DPLUGINPATH=\"/home/jayaramanp/bcftools_pd3_fork/libexec/bcftools\" -c -o vcfmerge.o vcfmerge.c vcfmerge.c: In function ‘main_vcfmerge’: vcfmerge.c:2401: error: ‘regidx_parse_reg’ undeclared (first use in this function) vcfmerge.c:2401: error: (Each undeclared identifier is reported only once vcfmerge.c:2401: error: for each function it appears in.) vcfmerge.c:2402: warning: implicit declaration of function ‘regidx_insert_list’ make: *\ [vcfmerge.o] Error 1
Ah, indeed. Can you clone my repo as described here http://pd3.github.io/bcftools? It can take some time before the regidx changes appear in the main repo.
Alternatively, you can simply copy guess-ploidy.c to bcftools-1.3, that should work.
how are you setting your environment variable BCFTOOLS_PLUGIN? this wasn't there before..
[jayaramanp@dgdrhr-01 bin]$ ~/bcftools_pd3_fork/bin/bcftools +guess-ploidy /nfs/DGD/Research/dropbox/CLINICALWES-10/CWES-0006/cwes-2.0_01292016_102521/CWES-0006.vcf.gz -v -t GT
No functional bcftools plugins were found. The environment variable BCFTOOLS_PLUGINS is not set.
Could not load "guess-ploidy".
here is the folder structure: [jayaramanp@dgdrhr-01 bcftools_pd3_fork]$ ls -R .: bin include lib libexec share
./bin: bcftools bgzip htsfile plot-vcfstats tabix vcfutils.pl
./include: htslib
./include/htslib: bgzf.h faidx.h hts_defs.h kbitset.h khash.h klist.h kseq.h kstring.h sam.h tbx.h vcf_sweep.h cram.h hfile.h hts.h kfunc.h khash_str2int.h knetfile.h ksort.h regidx.h synced_bcf_reader.h vcf.h vcfutils.h
./lib: libhts.a libhts.so libhts.so.1 libhts.so.1.3-35-g26b3085-dirty pkgconfig
./lib/pkgconfig: htslib.pc
./libexec: bcftools
./libexec/bcftools: aggregate.so counts.so fill-AN-AC.so fill-tags.so frameshifts.so mendelian.so tag2tag.so validate.so color-chrs.so dosage.so fill-pop-tags.so fixploidy.so guess-ploidy.so setGT.so trio-switch-rate.so
./share: man
./share/man: man1 man5
./share/man/man1: bcftools.1 htsfile.1 tabix.1
./share/man/man5: faidx.5 sam.5 vcf.5
Ive manged to get this installed.. This is what i get so far when i run it.
[jayaramanp@dgdrhr-01 cwes-2.0_01292016_102521]$ /nfs/Public/bcftools/1.3/bin/bcftools +guess-ploidy CWES-0016.vcf.gz -vt GT plugin directory /nfs/Public/bcftools/1.3/libexec/bcftools .. ok /nfs/Public/bcftools/1.3/libexec/bcftools/guess-ploidy.so: dlopen .. ok run .. ok
About: Determine sample sex by checking genotype likelihoods in non-PAR regions of sex chromosomes
Usage: bcftools +guess-ploidy
Examples: bcftools +guess-ploidy in.vcf.gz bcftools +guess-ploidy in.vcf.gz -t GL -r chrX:2699521-154931043 bcftools view file.vcf.gz -r chrX:2699521-154931043 | bcftools +guess-ploidy bcftools +guess-ploidy in.bcf -v > ploidy.txt && guess-ploidy.py ploidy.txt img
[jayaramanp@dgdrhr-01 cwes-2.0_01292016_102521]$ /nfs/Public/bcftools/1.3/bin/bcftools +guess-ploidy CWES-0016.vcf.gz -vt GL plugin directory /nfs/Public/bcftools/1.3/libexec/bcftools .. ok /nfs/Public/bcftools/1.3/libexec/bcftools/guess-ploidy.so: dlopen .. ok run .. ok
About: Determine sample sex by checking genotype likelihoods in non-PAR regions of sex chromosomes
Usage: bcftools +guess-ploidy
Examples: bcftools +guess-ploidy in.vcf.gz bcftools +guess-ploidy in.vcf.gz -t GL -r chrX:2699521-154931043 bcftools view file.vcf.gz -r chrX:2699521-154931043 | bcftools +guess-ploidy bcftools +guess-ploidy in.bcf -v > ploidy.txt && guess-ploidy.py ploidy.txt img
[jayaramanp@dgdrhr-01 cwes-2.0_01292016_102521]$ /nfs/Public/bcftools/1.3/bin/bcftools +guess-ploidy CWES-0016.vcf.gz -vt PL plugin directory /nfs/Public/bcftools/1.3/libexec/bcftools .. ok /nfs/Public/bcftools/1.3/libexec/bcftools/guess-ploidy.so: dlopen .. ok run .. ok
About: Determine sample sex by checking genotype likelihoods in non-PAR regions of sex chromosomes
Usage: bcftools +guess-ploidy
Examples: bcftools +guess-ploidy in.vcf.gz bcftools +guess-ploidy in.vcf.gz -t GL -r chrX:2699521-154931043 bcftools view file.vcf.gz -r chrX:2699521-154931043 | bcftools +guess-ploidy bcftools +guess-ploidy in.bcf -v > ploidy.txt && guess-ploidy.py ploidy.txt img
okay so when I ran it with -v -tag GT argument right after calling the plugin and then my input VCF file, i get really wrong results. whereas, running it the way you mentioned does not give me any results.
[jayaramanp@dgdrhr-01 cwes-2.0_01292016_102521]$ /nfs/Public/bcftools/1.3/bin/bcftools +guess-ploidy -v -tag GT CWES-0027.vcf.gz
plugin directory /nfs/Public/bcftools/1.3/libexec/bcftools .. ok /nfs/Public/bcftools/1.3/libexec/bcftools/guess-ploidy.so: dlopen .. ok run .. ok CLINICALWES-10_CWES-0027-F-U-DGD-14-95 F CLINICALWES-10_CWES-0027-M-U-DGD-14-96 F CLINICALWES-10_CWES-0027-P-A-DGD-14-94 F
The option should be --tag
or -t
rather than -tag
.
Otherwise it looks like a version incompatibility. Rather than trying to figure out how this happened and which version is interfering, can you try this:
git clone --branch=develop --recursive git://github.com/pd3/bcftools.git bcftools-pd3
cd bcftools-pd3
make
export BCFTOOLS_PLUGINS=./plugins/
./bcftools +guess-ploidy test.vcf -v -t GT
[correct output]
so i removed all previous installations.. and installed it back in my home directory i get a Segmentation Fault
[jayaramanp@dgdrhr-01 bcftools]$ ~/bcftools_pd3_fork/bin/bcftools +guess-ploidy /nfs/DGD/Research/dropbox/CLINICALWES-10/CWES-0027/cwes-2.0_01292016_102521/CWES-0027.vcf.gz -v -t GT
Segmentation fault (core dumped)
I also have issues with my mouse samples mapped to GRCm38.82 (http://ftp.ensembl.org/pub/release-82/gtf/mus_musculus/README). All male samples seems to be regarded as being females.
I corresponded with the author on this.. are you using guess-ploidy or vcf2sex plugin?
Yes my question is regarding guess-ploidy where I have used it with -t GT switch. Have no clue how to set up autosomal regions, ploidy info or restriction on X chr coordinates. Another thing is that it works when in vcf file are reference sequences of X chromosome denoted by X but not when chrX.
bcftools +guess-ploidy -e 0.1 -v -t GT
an error rate of 0.1 was recommended as the number of hets in the supposedly male sample was quite high. It could have been caused by contamination or mapping/calling artefacts. The author mentioned that, in good data the diploid/haploid likelihoods are usually pretty good. Assuming these are just errors, increasing -e helps counter those errors.
i even found out that a range of 0.1-0.3 still allows for the right guess. anything less than or greater than that number causes the wrong values or the complete opposite prediction of gender.
The problem seems resolved, closing the issue.
I tried with guess-ploidy with a vcf file that I had. I knew the gender of some samples apriori. My aim was to predict gender for all samples and crosscheck with already known samples to see whether indeed the prediction is true. Unfortunately all the samples were predicted as females but that was not the case. Would you please help me on how to predict gender correctly?
@toyanji Please open a new issue giving more details, ideally with a test case. The program prints usage page and some examples, not sure how to help more.
Hello, we've been using the vxf2sex plugin from the 1.3 version of bcftools successfully for single sample VCF files thus far. We recently started using this for prediction of gender for a trio VCF file and we've been getting wrong gender prediction specifically for the Males. (the vcf2sex plugin incorrectly predicts them as F)
when we use the same command we're using for the Trio gender prediction on the individual GVCF sample, it predicts them correctly.
assuming that the individual GVCF file and the individual VCF file will be the same, we did not convert the GVCF file into VCF file (although we have had pretty much 100% success rate in prediction of genders accurately when running it on different individual (single sample) VCF files.
family/trio VCF file gender prediction: CLINICALWES-10_CWES-0069-B-A-DGD-15-75 F CLINICALWES-10_CWES-0069-F-U-DGD-15-74 M CLINICALWES-10_CWES-0069-M-U-DGD-15-73 F CLINICALWES-10_CWES-0069-P-A-DGD-15-72 F
individual GVCF file prediction: CLINICALWES-10_CWES-0069-M-U-DGD-15-73 F CLINICALWES-10_CWES-0069-P-A-DGD-15-72 F CLINICALWES-10_CWES-0069-F-U-DGD-15-74 M CLINICALWES-10_CWES-0069-B-A-DGD-15-75 M
is there a difference on how it calculates gender for TRIO/Family VS single sample VCF/GVCF files?