iqbal-lab-org / simutator

Simulate mutations in genomes
MIT License
13 stars 1 forks source link

VCF file has deletion as insertions? #2

Open tseemann opened 2 weeks ago

tseemann commented 2 weeks ago

When adding deletions, the VCF shows them as insertions. REF and ALT are back to front? CC: @kristyhoran

martinghunt commented 2 weeks ago

I guess it depends on which way round you think of ref and mutated genome?

It makes 3 files:

  1. a fasta that has the mutations added
  2. a VCF where the ref is the input genome
  3. a VCF where the ref is the mutated fasta file 1.

Applying the mutations in file 2 to the input fasta gives the mutated fasta file 1.

Applying the mutations in file 3 to the mutated fasta file 1 gives the original input fasta.

Toy example ...

$ cat test.fa
>test_contig
CAAACCCTGCTTTTCCCATGCCCTTTTAACCTACCAGCTATCTGCTCTTAGGCTTCGGAT
CCAACGCCTTGAGCTCCGGTCTTCCGACGTCGAACACTCG
$ simutator mutate_fasta --del 40:3 test.fa out
[2024-08-23T09:30:06 simutator INFO] Simulating mutations of type 'deletion' with parameters {'dist': 40, 'len': 3}

The VCF w.r.t. input fasta has a deletion:

$ cat out.deletion.dist-40.len-3.original.vcf
##fileformat=VCFv4.2
##source=simutator, ref in this file is original genome. Mutations added: DEL_length_3_every_40
##contig=<ID=test_contig,length=100>
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  sample
test_contig 39  .   TATC    T   .   PASS    .   GT  1/1

The VCF w.r.t. the mutated ref has an insertion:

$ cat out.deletion.dist-40.len-3.mutated.vcf
##fileformat=VCFv4.2
##source=simutator, ref in this file is mutated genome. Mutations added: DEL_length_3_every_40
##contig=<ID=test_contig__simutator__DEL_length_3_every_40,length=97>
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  sample
test_contig__simutator__DEL_length_3_every_40   39  .   T   TATC    .   PASS    .   GT  1/1