Closed eastbourne closed 2 years ago
The output is correct. The homozygous reference genotypes are correctly written as 0/0 and there are no alternate alleles. What do you think is the problem here?
Thank you for your reply. I think I was expecting each allele in the REF ALT columns, such as the following output:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SampleName
1 13380 rs571093408 C G . PASS RefPanelAF=7.69941e-05;AN=2;AC=0;INFO=1.24993e-09 GT:ADS:DS:GP 0|0:0.05,0.05:0.1:0.9025,0.095,0.0025
1 16071 rs541172944 G A . PASS RefPanelAF=0.000123191;AN=2;AC=0;INFO=1.24993e-09 GT:ADS:DS:GP 0|0:0.05,0.05:0.1:0.9025,0.095,0.0025
1 16141 rs529651976 C T . PASS RefPanelAF=0.000138589;AN=2;AC=0;INFO=1.24993e-09 GT:ADS:DS:GP 0|0:0.05,0.05:0.1:0.9025,0.095,0.0025
1 16280 . T C . PASS RefPanelAF=0.00066215;AN=2;AC=0;INFO=1.24993e-09 GT:ADS:DS:GP 0|0:0.05,0.05:0.1:0.9025,0.095,0.0025
1 49298 rs200943160 T C . PASS RefPanelAF=0.640145;AN=2;AC=2;INFO=9.82494e-10 GT:ADS:DS:GP 1|1:0.65,0.65:1.3:0.1225,0.455,0.4225
My goal is to have VCF files to run on the Michigan Imputation Server. Is there a way to get files in the format I expect? Thank you
The file is a correctly formatted VCF. Are you saying it is not accepted by the imputation server?
As for filling the ALT allele, that's not possible at the moment, even though there was an attempt to add this functionality in the past https://github.com/samtools/bcftools/commit/0792ae8b91e1efb7f0c904e90a4771944a9cc7c8.
I run the job again and inspected the results. The file is accepted by the Imputation Server, but somehow the Server is discarding many sites. So I am not sure if that's because of the format or parameters in the server.
Excluded sites in total: 451,287
Remaining sites in total: 147,340
See snps-excluded.txt for details
Typed only sites: 5,091
See typed-only.txt for details
Why don't you look which sites were excluded and see if they are all ALT=.
?
Indeed, those are ALT = .
#Position FilterType Info
1:69869:T:. Invalid Alleles
1:565508:G:. Invalid Alleles
1:727841:G:. Invalid Alleles
1:754105:C:. Invalid Alleles
1:759036:G:. Invalid Alleles
1:776546:A:. Invalid Alleles
1:794332:G:. Invalid Alleles
1:801536:T:. Invalid Alleles
1:824398:A:. Invalid Alleles
1:830181:A:. Invalid Alleles
1:834830:G:. Invalid Alleles
1:835092:T:. Invalid Alleles
It is now possible to transfer ALT from one VCF into another https://github.com/samtools/bcftools/commit/f6047f8eb4bf9a74cda6adafff5bf5c32f723483, e.g. as
bcftools annotate -a annots.vcf.gz -c +ALT file.vcf.gz
I hope this will help to resolve the issue.
I believe this issue can be marked as resolved now
What server are you using? I tried using the Michigan Imputation Server to impute my 23andMe genome, processed identically to yours, and got a message that there was a minimum of 20 genomes required for imputation. Thanks!
I am running the following command to convert a 23andme file to a vcf file.
bcftools convert -c ID,CHROM,POS,AA -s SampleName -f 23andme-ref.fa --tsv2vcf 23andme.txt -Oz -o out.vcf.gz
I have ensured that the 23andme file is tab-separated. I am able to get some output from bcftools but I am unsure if the program is working correctly. The head of my original file is:and I am getting the output as:
Is the REF and ALT correct? They do not show the same information as the original file. I don't know if I am missing something else here? Thank you in advance for any help.