tskit-dev / tstrait

Quantitative trait simulation for ARGs
https://tskit.dev/software/tstrait
MIT License
7 stars 6 forks source link

New model for trait simulation #148

Closed hanbin973 closed 6 months ago

hanbin973 commented 7 months ago

Hi all, thank you for the wonderful software. I'm asking if a new model (which I'll shortly describe) could be implemented in tstarit.

The model I'm considering has a fixed effect size for positions on the genome. In terms of tskit syntax, there is a length ts.sequence_length vector that contains the effect size of each position. I denote this vector as b and b[p] is the effect size of position p in [0, ts.sequence_length-1].

The trait of an individual is determined by the mutational process. msprime.sim_mutations(ts) will place mutations at a probability proportional to the area of the branches. Then, these mutations have effect sizes determined by the vector b defined previously. If an individual has a mutation at p, we should add b[p] to the individual's trait value.

My understanding of the current implementation is that effect size is drawn randomly following the trait model (e.g., normal distribution). I think the new model can be implemented by simply replacing the random draw with the pre-specified b vector.

Would this be plausible?

daikitag commented 7 months ago

@hanbin973 Thank you for your comment. To do that, you can define your trait dataframe, where you specify the site ID and the effect size (https://tskit.dev/tstrait/docs/stable/genetic.html#user-defined-trait-dataframe). Through this, you can simulate traits with the defined effect size, instead of a random draw from a distribution. Would it be possible for you to let me know if there is any other things that you would like to do?