sjspielman / pyvolve

Python library to simulate evolutionary sequence data
Other
78 stars 23 forks source link

Root sequence with missing characters #15

Closed ksw9 closed 5 years ago

ksw9 commented 5 years ago

Hi, Thanks for a wonderful program. I am hoping to simulate evolution from a specified root sequence, a genome that has some missing information, encoded as 'N'. When I try to evolve partitions, I get the error:

Provided root sequence does not have the same code (alphabet) as model. Remove all noncanonical and/or wrong letters from provided root sequences. Further, if you are specifying codons, ensure that the length of your root sequence is divisible by 3.

I'm wondering if there is a way around this - how can I add 'N' to my code/alphabet and ensure that these sites do not evolve?

Thank you in advance!

sjspielman commented 5 years ago

Hi @ksw9,

This seems like a very particular use case where you would benefit from using a custom model and custom alphabet entirely. An example of this is here. Yours might look something like this (where I use <nd> for negative diagonal):

import pyvolve
import numpy as np

# Define a phylogeny, from a file containing a newick tree
my_tree = pyvolve.read_tree(file = "file_with_tree.tre")

# Define a custom model with custom matrix and custom code (states). The matrix must be square and have the same dimension (in 1D) as the provided code. Note that code is a list because, in theory, you can specify multi-character (as in letters) states.
code = ["N", "A", "C", "G", "T"] ## same order as custom matrix defined below. When defining matrix we set all rates to/from N to "0".
matrix = np.array([ 
                               [<nd>  , 0, 0, 0, 0],
                               [0, <nd>, rate_ac, rate_ag, rate_at],
                               [0, rate_ca, <nd>, rate_cg, rate_ct],
                               [0, rate_ga, rate_gc, <nd>, rate_gt],
                               [0, rate_ta, rate_tc, rate_tg, <nd>]
              ])

my_model = pyvolve.Model("custom", {"matrix":matrix, "code":code} ) ## use "custom" as first argument and supply these key/values to specify the above defined code and matrix
my_partition = pyvolve.Partition(models = my_model, size = <length of your alignment>)
my_evolver = pyvolve.Evolver(partitions = my_partition, tree = my_tree)
my_evolver()

Let me know how this works for you.

Best, Stephanie

ksw9 commented 5 years ago

Stephanie,

This is great and works well! Thank you so much for getting back to me so quickly. Re: defining the substitution matrix - is there a numpy trick to fill in the diagonal elements so rows sum to 0 as you showed above?

Thanks again for a wonderful program!

katie