kordk / torch-ecpg

(GPU accelerated) eCpG mapper
BSD 3-Clause "New" or "Revised" License
2 stars 0 forks source link

Perform p-value calculation for Pearson correlation rho and add to flattened output #9

Closed kordk closed 1 year ago

kordk commented 1 year ago

We can obtain a p-value from a t distribution with n-2 degrees of freedom (e.g., https://support.minitab.com/en-us/minitab/21/help-and-how-to/statistics/basic-statistics/how-to/correlation/methods-and-formulas/methods-and-formulas/#p-value).

We can obtain the distribution usingthe tensor library function “StudentT" (https://www.tensorflow.org/probability/api_docs/python/tfp/distributions/StudentT) where n=the number of patients:

tfp.distributions.StudentT(
    df,
    loc,
    scale,
    validate_args=False,
    allow_nan_stats=True,
    name=‘StudentT’
)

This code should work:

import tensorflow_probability as tfp
tfd = tfp.distributions

# Define a single scalar Student t distribution.
# Pearson’s correlation coefficient (rho) has a value from -1 to 1, with a mean of zero
n=300
single_dist = tfd.StudentT(df=n-2, loc=0, scale=1)

# Evaluate the pdf for rho=1, returning a scalar Tensor.
single_dist.prob(1.)

# Evaluate the pdf at rho=3, returning a scalar Tensor.
single_dist.prob(3.)
kordk commented 1 year ago

Here is an example of the code and output on pnldev:

# kord@pnldev [09:06:08] ~ $
python
Python 3.10.4 (main, Mar 31 2022, 08:41:55) [GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow_probability as tfp
>>> tfd = tfp.distributions
>>> n=300
>>> single_dist = tfd.StudentT(df=n-2, loc=0, scale=1)
2022-10-12 09:06:35.337386: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-10-12 09:06:36.150507: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 719 MB memory:  -> device: 0, name: NVIDIA A2, pci bus id: 0000:81:00.0, compute capability: 8.6
>>> single_dist.prob(1.)
<tf.Tensor: shape=(), dtype=float32, numpy=0.24156524>
>>> single_dist.prob(3.)
<tf.Tensor: shape=(), dtype=float32, numpy=0.0046632667>
liamgd commented 1 year ago

936715b

P-values should be calculated using torch.distributions.studentT.StudentT. Currently, the degrees of freedom is the number of columns, which is the number of covariates plus one for the gene expression column, minus two like in the code examples above. Let me know if this needs to change.