Open xiangzhu opened 6 years ago
I have added inst/m_code/negtwo_loglik.m
and a "direct" transliteration to R inst/m_code/negtwo_loglik.R
. I have added tests that my likelihood function is equivalent (except for the factor of 2 difference, see https://github.com/CreRecombinase/RSSp/blob/2cab5d5ac6f6fef2198edec5784de2d268ae5768/tests/testthat/test_dnorm.R#L3)
@CreRecombinase Thanks for the update! Good to see two sets of codes written by us produce equivalent likelihood values (up to a constant 2).
However I am not sure if your tests above may complicate the package development or not.
I was suggesting a very simple consistency check in https://github.com/CreRecombinase/PolygenicRSS/issues/5:
To perform this consistency check, it seems we don't need to add any additional test codes to your R package. (Perhaps you have other reasons for doing so?)
To keep it simple, if we can proof that two sets of codes produce almost identical results on the same data, then we can probably move on at this point?
In the long run we may consider including these codes for more rigorous testing.
@CreRecombinase as we discussed yesterday, I have generated the following input and output data using my own implementations in Matlab to facilitate your consistency check.
Please let me know if you have any question. Thanks!
Note that I will be using this thread to document all data files that are generated for your tests.
All data files are saved at /project/compbio/RSSp/consistency_check_data/
, to which you have full access.
In this file I simulate vhat
from Normal(0, 4^2) and dvec
from Unif(0, 2), and I create 100 replicates. For each replicate, I fit RSS-P(0) model, and then record output obj
(-2 * log likelihood), par
(MLE) and pve
(PVE estimate).
You can easily load these data in R
as follows:
> suppressPackageStartupMessages(library(R.matlab))
> res_file <- "/project/compbio/RSSp/consistency_check_data/rand_seed_312_k_1_rep_100.mat"
> res_data <- R.matlab::readMat(res_file)
> names(res_data)
[1] "dvec" "obj" "par" "pve" "vhat"
> dim(res_data$dvec)
[1] 10000 100
> dim(res_data$vhat)
[1] 10000 100
> dim(res_data$obj)
[1] 100 1
> dim(res_data$par)
[1] 100 1
> dim(res_data$pve)
[1] 100 1
When applying your R codes on dvec[,j]
and vhat[,j]
, please do not adjust the row order.
@CreRecombinase Hi Nick,
Below are a few links that accompany my notes posted on Slack — i spent some time after our meeting to clean up these codes; hopefully they are not too hard to read and use:
fit_rssp.m
: fit a RSSp model by maximizing its marginal likelihoodestimate_pve.m
: obtain posterior mean and simulate posterior sample for PVE, given a fitted RSSpYou should have access to these scripts since i have asked Peter to added you to my developer repo. Please let me know if this is not the case.
Before including these scripts into your work, please first make sure they work on your side. To this end I have created a simple example script
single_test_nick.m
for you to run (you need add input data files there). I only tested these scripts in Matlab 2017b:module load matlab/2017b
.A few more remarks:
The baseline model that you are mainly interested in corresponds to my RSSp model with only
c0
term; that is,fit_rssp.m
is built on thefmincon
function in Matlab, and so you can use it to confirm your R/C++ implementation of baseline model.Can you please apply
estimate_pve.m
to re-estimate PVE for Framingham eQTL data? It should be not that hard, since all the RSSp models have been fitted and all EVDs have been precomputed.Can you please also apply my RSSp models with confounding correction (i.e. with
c1, c2, c3, ...
terms) to some genes with extremely large PVE estimates in Framingham data? Ideally we want to see reduced PVE estimates for these "problematic" genes.