Closed npklein closed 8 years ago
Hi Niek, Can you upload/paste a small example vcf file where pasteFiles fails so we can replicate the issue on our side? Dan
Hi Dan,
Thanks for the fast reply. Example vcf: http://pastebin.com/jkmQJm1Y
Dear Niek,
I understand your point. However, it is not so realistic to parse and check all possible FORMAT combinations for each SNP times individual in the "pasteFiles" because of computational complexity. In addition, RASQUAL does not accept unphased and missing genotypes in the VCF. You have to impuate/phase those genotypes first. And I guess during the process, this FORMAT problem will be solved.
Best regards, Natsuhiko
Hi Niek, As Natsuhiko has said, the best solution to this is impute and phase your genotypes. You will need to do this anyway for RASQUAL to work, and it will also solve the formatting issue I think. Best Dan
Thanks for your quick help, I will do that.
We have VCF files created by the GATK genotyping pipeline. This includes multiple columns other than GT in the FORMAT field, such as AD, DP GQ, PL (GT:AD:DP:GQ:PGT:PID:PL). When we use pasteFiles to add AS counts this gets appended to the end (GT:AD:DP:GQ:PGT:PID:PL:AS). Missing fields can be set to ., but trailing fields are dropped according to the VCF format.
E.g. a sample field has DP, GQ, PGT, PID, and PL missing, so the value is ./.:0,0:0. Then the AS gets appended to ./.:0,0:0:0,0, but now it does not match the FORMAT field.