tskit-dev / tstrait

Quantitative trait simulation for ARGs
https://tskit.dev/software/tstrait
MIT License
7 stars 6 forks source link

Specifying mean and variance of simulated phenotypes #128

Closed daikitag closed 9 months ago

daikitag commented 9 months ago

We should add a new function to tstrait, where the input is the simulated phenotype dataframe and the ouput is a scaled phenotype dataframe.

daikitag commented 9 months ago

I have a comment regarding the normalizing step. I remember us talking about the possibility of adding a new function tstrait.normalize(phenotype_df) , but I'm wondering about the output of the function. We have 2 possibilities:

  1. Only output a dataframe with 3 columns, simulated phenotypes, trait ID, and individual ID. I saw ARG-GRM's simulation framework, and their output was only simulated phenotypes after normalizing the results.
  2. Try obtaining the normalized version of genetic values/environmental noise by obtaining the ratio (genetic_value/phenotype) in the original dataframe.

The users can do these steps manually in the tstrait simulation, as we are taking a modular step, and I'm wondering if we should really implement this function. Do you have any suggestions @jeromekelleher ?

jeromekelleher commented 9 months ago

If it's just normalising the phenotypes, we could make a function

def normalise_phenotypes(phenotype_df):
    """
    Returns the phenotypes in the specified dataframe normalised by [XXX] as a numpy array
    """

So, return just the phenotype array. The use could then do whatever they want with it. Would this work?

daikitag commented 9 months ago

If it's just normalising the phenotypes, we could make a function

def normalise_phenotypes(phenotype_df):
    """
    Returns the phenotypes in the specified dataframe normalised by [XXX] as a numpy array
    """

So, return just the phenotype array. The use could then do whatever they want with it. Would this work?

I think we can do that. I will make the function that returns the dataframe with individual ID and trait ID as well, as the ordering of the phenotypes can be confusing without any labels.

jeromekelleher commented 9 months ago

Sounds good.