lindenb / jvarkit

Java utilities for Bioinformatics
https://jvarkit.readthedocs.io/
Other
482 stars 133 forks source link

Processing the output .vcf file from msa2vcf #107

Closed MostafaYA closed 6 years ago

MostafaYA commented 6 years ago

Hi, I am using the script msa2vcf to get a .vcf file from a multiple alignment file the tools works perfectly with me, but I want to create a SNP table from the produced VCF file my issue is: is that possible to replace the allele calls with the actual nucleotide in each of the samples?

=Here is my example

$ cat alignment_file.aln

sample1 ACGAGGCTAGATGA sample2 ACGTGGCTAGATCA sample3 ACGTGCCTAGATCA

$msa2vcf alignment_file.aln [INFO][MsaToVcf]Reading from alignment_file.aln [INFO][MsaToVcf]format : Fasta

fileformat=VCFv4.2

FORMAT=

FORMAT=

INFO=

contig=

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample1 sample2 sample3

chrUn 4 . T A . . DP=3 GT:DP 1/1:1 0/0:1 0/0:1 chrUn 6 . G C . . DP=3 GT:DP 0/0:1 0/0:1 1/1:1 chrUn 13 . C G . . DP=3 GT:DP 1/1:1 0/0:1 0/0:1 [INFO][MsaToVcf]Done

My desired format is

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample1 sample2 sample3

chrUn 4 . T A . . DP=3 GT:DP A T T chrUn 6 . G C . . DP=3 GT:DP G G C chrUn 13 . C G . . DP=3 GT:DP G C C

=so I can proceed with that may be with awk commands to get the SNP table like this POS REF sample1 sample2 sample3 4 T A T T 6 G G G C 13 C G C C

Would also appreciate if you refer me to any further tool to proceed with.

lindenb commented 6 years ago

Hi sorry for the late response, the message was in the spam folder.

is that possible to replace the allele calls with the actual nucleotide in each of the samples?

see something like https://www.biostars.org/p/246796/

closing this one.