Open tseemann opened 2 weeks ago
I guess it depends on which way round you think of ref and mutated genome?
It makes 3 files:
Applying the mutations in file 2 to the input fasta gives the mutated fasta file 1.
Applying the mutations in file 3 to the mutated fasta file 1 gives the original input fasta.
Toy example ...
$ cat test.fa
>test_contig
CAAACCCTGCTTTTCCCATGCCCTTTTAACCTACCAGCTATCTGCTCTTAGGCTTCGGAT
CCAACGCCTTGAGCTCCGGTCTTCCGACGTCGAACACTCG
$ simutator mutate_fasta --del 40:3 test.fa out
[2024-08-23T09:30:06 simutator INFO] Simulating mutations of type 'deletion' with parameters {'dist': 40, 'len': 3}
The VCF w.r.t. input fasta has a deletion:
$ cat out.deletion.dist-40.len-3.original.vcf
##fileformat=VCFv4.2
##source=simutator, ref in this file is original genome. Mutations added: DEL_length_3_every_40
##contig=<ID=test_contig,length=100>
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample
test_contig 39 . TATC T . PASS . GT 1/1
The VCF w.r.t. the mutated ref has an insertion:
$ cat out.deletion.dist-40.len-3.mutated.vcf
##fileformat=VCFv4.2
##source=simutator, ref in this file is mutated genome. Mutations added: DEL_length_3_every_40
##contig=<ID=test_contig__simutator__DEL_length_3_every_40,length=97>
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample
test_contig__simutator__DEL_length_3_every_40 39 . T TATC . PASS . GT 1/1
When adding deletions, the VCF shows them as insertions. REF and ALT are back to front? CC: @kristyhoran