Xiang's implementation of RSSp and related methods

xiangzhu commented 6 years ago

@CreRecombinase Hi Nick,

Below are a few links that accompany my notes posted on Slack — i spent some time after our meeting to clean up these codes; hopefully they are not too hard to read and use:

fit_rssp.m: fit a RSSp model by maximizing its marginal likelihood
estimate_pve.m: obtain posterior mean and simulate posterior sample for PVE, given a fitted RSSp

You should have access to these scripts since i have asked Peter to added you to my developer repo. Please let me know if this is not the case.

Before including these scripts into your work, please first make sure they work on your side. To this end I have created a simple example script single_test_nick.m for you to run (you need add input data files there). I only tested these scripts in Matlab 2017b: module load matlab/2017b.

A few more remarks:

The baseline model that you are mainly interested in corresponds to my RSSp model with only c0 term; that is,
```
k = 1;
[cvec_1, obj_1] = fit_rssp(vhat, dvec, zeros(k,1));
```
fit_rssp.m is built on the fmincon function in Matlab, and so you can use it to confirm your R/C++ implementation of baseline model.
Can you please apply estimate_pve.m to re-estimate PVE for Framingham eQTL data? It should be not that hard, since all the RSSp models have been fitted and all EVDs have been precomputed.
Can you please also apply my RSSp models with confounding correction (i.e. with c1, c2, c3, ... terms) to some genes with extremely large PVE estimates in Framingham data? Ideally we want to see reduced PVE estimates for these "problematic" genes.

CreRecombinase commented 5 years ago

I have added inst/m_code/negtwo_loglik.m and a "direct" transliteration to R inst/m_code/negtwo_loglik.R . I have added tests that my likelihood function is equivalent (except for the factor of 2 difference, see https://github.com/CreRecombinase/RSSp/blob/2cab5d5ac6f6fef2198edec5784de2d268ae5768/tests/testthat/test_dnorm.R#L3)

xiangzhu commented 5 years ago

@CreRecombinase Thanks for the update! Good to see two sets of codes written by us produce equivalent likelihood values (up to a constant 2).

However I am not sure if your tests above may complicate the package development or not.

I was suggesting a very simple consistency check in https://github.com/CreRecombinase/PolygenicRSS/issues/5:

simulate data
run your R codes on the simulated data
run my Matlab codes on the same data
compare output: likelihood, PVE estimates, etc

To perform this consistency check, it seems we don't need to add any additional test codes to your R package. (Perhaps you have other reasons for doing so?)

To keep it simple, if we can proof that two sets of codes produce almost identical results on the same data, then we can probably move on at this point?

In the long run we may consider including these codes for more rigorous testing.

xiangzhu commented 5 years ago

@CreRecombinase as we discussed yesterday, I have generated the following input and output data using my own implementations in Matlab to facilitate your consistency check.

Please let me know if you have any question. Thanks!

Note that I will be using this thread to document all data files that are generated for your tests.

All data files are saved at /project/compbio/RSSp/consistency_check_data/, to which you have full access.

2019-03-12

In this file I simulate vhat from Normal(0, 4^2) and dvec from Unif(0, 2), and I create 100 replicates. For each replicate, I fit RSS-P(0) model, and then record output obj (-2 * log likelihood), par (MLE) and pve (PVE estimate).

You can easily load these data in R as follows:

> suppressPackageStartupMessages(library(R.matlab))
> res_file <- "/project/compbio/RSSp/consistency_check_data/rand_seed_312_k_1_rep_100.mat"
> res_data <- R.matlab::readMat(res_file)
> names(res_data)
[1] "dvec" "obj"  "par"  "pve"  "vhat"
> dim(res_data$dvec)
[1] 10000   100
> dim(res_data$vhat)
[1] 10000   100
> dim(res_data$obj)
[1] 100   1
> dim(res_data$par)
[1] 100   1
> dim(res_data$pve)
[1] 100   1

When applying your R codes on dvec[,j] and vhat[,j], please do not adjust the row order.

xinhe-lab / RSSp

Xiang's implementation of RSSp and related methods #4

2019-03-12