kordk / torch-ecpg

(GPU accelerated) eCpG mapper
BSD 3-Clause "New" or "Revised" License
2 stars 0 forks source link

MLR output data and label mismatch #15

Closed kordk closed 1 year ago

kordk commented 1 year ago

The column labels in the output file appear to be mismatched to the data columns. For example, if the gt_p column is the p-value the values in the column do not range as expected (i.e., 0 to 1).

# kord@pnldev [08:56:51] ~/proj/torch-ecpg-proj/test-gpu2/tecpg_testing/output $
head out.csv | sed 's/,/\t/g' | column -t
meth_site  gene_site  gt_est              gt_err                 gt_t                  gt_p
cg001      ILMN_001   0.6011338363344905  0.0005677593580092201  -0.8961269511329663   -1.3220715033603496
cg001      ILMN_002   0.5824376034410709  0.0005968888817385717  -0.24864339878275496  -0.9507517324430542
cg001      ILMN_003   0.5893747381100741  0.0005908585217481776  -0.5282193869619084   -1.059652008050611
cg001      ILMN_004   0.5513791373030213  0.0005824071871361978  0.9078375566063069    -1.3326408949104493
cg001      ILMN_005   0.6162653141571155  0.0005862441846027752  -1.2414869387746688   -1.690980994923232
cg001      ILMN_006   0.5892329140981064  0.0005420443074413211  -0.7143537180827914   -1.1755309537610261
cg001      ILMN_007   0.6119049978285847  0.0005754638947431182  -1.8409645974058517   -2.6103786035817667
cg001      ILMN_008   0.5339348023250208  0.0005758425121818368  1.8395489412475008    -2.60779419974748
cg001      ILMN_009   0.5704315319601887  0.0005868777456940151  0.1691037248126017    -0.9340844156324676
liamgd commented 1 year ago

The p-value column is negative because it is using the torch.distributions.studentT.StudentT(df).log_prob function. It is the logarithm of the traditional probability. Because probability goes from 0 to 1 and log(1) = 0 and lim x->0+ log(x) = -inf, the resulting log_prob ranges from -inf to 0. Would you like the data to be transformed back into regular probability using an exponential? I do not know what base the logarithm is, but I assume it is the natural log, e^(log_prob) should be the correct p-value.

liamgd commented 1 year ago

Fixed in 937787e. Now, the log_prob distribution is scaled to linear using torch.exp().