Perform p-value calculation for Pearson correlation rho and add to flattened output #9

kordk commented 1 year ago

We can obtain a p-value from a t distribution with n-2 degrees of freedom (e.g., https://support.minitab.com/en-us/minitab/21/help-and-how-to/statistics/basic-statistics/how-to/correlation/methods-and-formulas/methods-and-formulas/#p-value).

We can obtain the distribution usingthe tensor library function “StudentT" (https://www.tensorflow.org/probability/api_docs/python/tfp/distributions/StudentT) where n=the number of patients:


This code should work:

import tensorflow_probability as tfp
tfd = tfp.distributions

# Define a single scalar Student t distribution.
# Pearson’s correlation coefficient (rho) has a value from -1 to 1, with a mean of zero
single_dist = tfd.StudentT(df=n-2, loc=0, scale=1)

# Evaluate the pdf for rho=1, returning a scalar Tensor.

# Evaluate the pdf at rho=3, returning a scalar Tensor.
kordk commented 1 year ago

Here is an example of the code and output on pnldev:

>>> import tensorflow_probability as tfp
>>> tfd = tfp.distributions
>>> n=300
>>> single_dist = tfd.StudentT(df=n-2, loc=0, scale=1)
>>> single_dist.prob(1.)
<tf.Tensor: shape=(), dtype=float32, numpy=0.24156524>
>>> single_dist.prob(3.)
<tf.Tensor: shape=(), dtype=float32, numpy=0.0046632667>
liamgd commented 1 year ago


P-values should be calculated using torch.distributions.studentT.StudentT. Currently, the degrees of freedom is the number of columns, which is the number of covariates plus one for the gene expression column, minus two like in the code examples above. Let me know if this needs to change.