grenaud / glactools

command-line tools for the management of genotype likelihoods and allele counts
http://grenaud.github.io/glactools/
GNU General Public License v3.0
28 stars 2 forks source link

Can not deal with vcf file #4

Closed yzongzjnu closed 4 years ago

yzongzjnu commented 5 years ago

Hi, I used glactools to convert multiple sample vcf file to acf format and finally treemix format ./glactools vcfm2acf --fai ../data/Camarosa_CMN_genomes.fasta.fai ../DoB_rand_samples_CallVariants.vcf Unfortunately, there was an error SimpleVCF: unable to determine genotype for field=#0/0/0/0/0/0/0/0# at position: Fvb1-1:7753

The vcf file I used is made from callvariants, which is a script in bbmap. Besides, I tried to get acf file from bam formate. I also failed with the following report.

./glactools bam2acf ../data/Camarosa_CMN_genomes.fasta ../virg_NA-UT-2-6-3.realigned.bam virg_NA-UT-2-6-3   BC 48mUU( 6in;.R H(bwwk}9yg;k̹r>U)RZEvD|Z4ټeҪyTkޱyݛ7Vn;uغ{6ͻhuծ[fukҸ\frY廵nޡ]N[Js5k5nZaZV*?ey3rZfgʫѦu -ZTo:?b4W/W!\U 4WߨAFu+"\Fu+eggVdR1?Bn^6e&;+7'7Ɏ%bżrrsfV켊+'B^V 7?3'r K/VN~n -3 rV:lSBVvœd7'3?'䨜 m9Y_1#_ONǒ AAg%LdṰX9Y9A97ONFV^^nbddeE[HǻoٹsVQJŻuܣxN-W%DWe T [E::bgzf_flush] File write failed (wrong size) Write error BAM2ACF: error writing header

May I have some help from anybody? Thanks!

grenaud commented 5 years ago

Hello! -For the vcfm2acf, I am not sure how one should interpret "0/0/0/0/0/0/0/0" in terms of genotype. Is this a polyploid genome? I am not familiar with that genotyper, not to be dogmatic, I would tend to stick with published methods at least for testing.

grenaud commented 5 years ago

@yzongzjnu have you any update regarding this?

grenaud commented 4 years ago

is this fine? can I close the issue?

coopergrace commented 4 years ago

Hi, I'm having pretty much the same issue - I'm happy to open a new issue if you prefer.

My command: glactools vcfm2acf --fai reference.fasta.fai biallelic.snps.vcf.gz > biallelic.snps.acf

and the error:

SimpleVCF: unable to determine genotype for field=#.# at position: 1:33
Error: GlacParser tried to read 4 bytes but got 0

My VCF was generated with Freebayes, filtered with BCFtools and pruned for linkage with PLINK.

Any suggestions you have would be greatly appreciated.

Cheers

grenaud commented 4 years ago

Hello! Could you paste the offending record or line? I saw GTs as "./."

On Thu, May 14, 2020 at 1:39 PM Cooper notifications@github.com wrote:

Hi, I'm having pretty much the same issue - I'm happy to open a new issue if you prefer.

My command: glactools vcfm2acf --fai reference.fasta.fai biallelic.snps.vcf.gz > biallelic.snps.acf

and the error:

SimpleVCF: unable to determine genotype for field=#.# at position: 1:33 Error: GlacParser tried to read 4 bytes but got 0

My VCF was generated with Freebayes, filtered with BCFtools and pruned for linkage with PLINK.

Any suggestions you have would be greatly appreciated.

Cheers

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/grenaud/glactools/issues/4#issuecomment-628576510, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQRNI5Y4CXJUM76A4E3IBLRRPJ6HANCNFSM4HDDH5JA .

coopergrace commented 4 years ago

Thank you for responding. Here's the line:

1 33 1_33 C T 131342 PASS AB=0.484013;ABP=9.05068;AC=783;AF=0.628606;AN=1298;AO=6505;CIGAR=1X;DP=13284;DPB=13284;DPRA=0.819753;EPP=2317.25;EPPR=72.2722;GTI=117;LEN=1;MEANALT=1;MQM=41.5646;MQMR=33.472;NS=853;NUMALT=1;ODDS=0.00103555;PAIRED=0.79339;PAIREDR=0.528101;PAO=0;PQA=0;PQR=0;PRO=0;QA=236494;QR=189135;RO=6779;RPL=243;RPP=12096.6;RPPR=3587.12;RPR=6262;RUN=1;SAF=4714;SAP=2855.11;SAR=1791;SRF=3494;SRP=17.0023;SRR=3285;TYPE=snp;technology.illumina=1 GT:DP:AD:RO:QR:AO:QA:GL 1/1:1:0,1:0:0:1:38:-3.79727,-0.30103,0 1/1:11:0,11:0:0:11:362:-32.645,-3.31133,0 1/1:7:1,6:1:16:6:186:-15.4907,-0.557485,0

grenaud commented 4 years ago

This is strange, as no records have a "." as GT. Could you send me the first few lines of the file including the header via email please.

On Thu, May 14, 2020 at 3:30 PM Cooper notifications@github.com wrote:

Thank you for responding. Here's the line:

1 33 1_33 C T 131342 PASS AB=0.484013;ABP=9.05068;AC=783;AF=0.628606;AN=1298;AO=6505;CIGAR=1X;DP=13284;DPB=13284;DPRA=0.819753;EPP=2317.25;EPPR=72.2722;GTI=117;LEN=1;MEANALT=1;MQM=41.5646;MQMR=33.472;NS=853;NUMALT=1;ODDS=0.00103555;PAIRED=0.79339;PAIREDR=0.528101;PAO=0;PQA=0;PQR=0;PRO=0;QA=236494;QR=189135;RO=6779;RPL=243;RPP=12096.6;RPPR=3587.12;RPR=6262;RUN=1;SAF=4714;SAP=2855.11;SAR=1791;SRF=3494;SRP=17.0023;SRR=3285;TYPE=snp;technology.illumina=1 GT:DP:AD:RO:QR:AO:QA:GL 1/1:1:0,1:0:0:1:38:-3.79727,-0.30103,0 1/1:11:0,11:0:0:11:362:-32.645,-3.31133,0 1/1:7:1,6:1:16:6:186:-15.4907,-0.557485,0

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/grenaud/glactools/issues/4#issuecomment-628636360, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQRNI7IAOHNAVZ46OW4B6DRRPXAPANCNFSM4HDDH5JA .

grenaud commented 4 years ago

I have added a condition on seeing a GT and GL as ".", those will be marked as no data.

I took the file that you send by email and generated ACF using the following command: ~/Software/glactools/glactools vcfm2acf --onlyGT --fai TriTrypDB-46_LdonovaniBPK282A1_Genome.fasta.fai check.vcf.gz > check.acf.gz

Let me know if that works for you.

On Thu, May 14, 2020 at 3:42 PM Gabriel Renaud gabriel.reno@gmail.com wrote:

This is strange, as no records have a "." as GT. Could you send me the first few lines of the file including the header via email please.

On Thu, May 14, 2020 at 3:30 PM Cooper notifications@github.com wrote:

Thank you for responding. Here's the line:

1 33 1_33 C T 131342 PASS AB=0.484013;ABP=9.05068;AC=783;AF=0.628606;AN=1298;AO=6505;CIGAR=1X;DP=13284;DPB=13284;DPRA=0.819753;EPP=2317.25;EPPR=72.2722;GTI=117;LEN=1;MEANALT=1;MQM=41.5646;MQMR=33.472;NS=853;NUMALT=1;ODDS=0.00103555;PAIRED=0.79339;PAIREDR=0.528101;PAO=0;PQA=0;PQR=0;PRO=0;QA=236494;QR=189135;RO=6779;RPL=243;RPP=12096.6;RPPR=3587.12;RPR=6262;RUN=1;SAF=4714;SAP=2855.11;SAR=1791;SRF=3494;SRP=17.0023;SRR=3285;TYPE=snp;technology.illumina=1 GT:DP:AD:RO:QR:AO:QA:GL 1/1:1:0,1:0:0:1:38:-3.79727,-0.30103,0 1/1:11:0,11:0:0:11:362:-32.645,-3.31133,0 1/1:7:1,6:1:16:6:186:-15.4907,-0.557485,0

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/grenaud/glactools/issues/4#issuecomment-628636360, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQRNI7IAOHNAVZ46OW4B6DRRPXAPANCNFSM4HDDH5JA .

coopergrace commented 4 years ago

Seems to be working now, thank you Gabriel! Much appreciated.