xinhe-lab / RSSp

Polygenic RSS
GNU General Public License v3.0
0 stars 0 forks source link

Xiang's implementation of RSSp and related methods #4

Open xiangzhu opened 6 years ago

xiangzhu commented 6 years ago

@CreRecombinase Hi Nick,

Below are a few links that accompany my notes posted on Slack — i spent some time after our meeting to clean up these codes; hopefully they are not too hard to read and use:

You should have access to these scripts since i have asked Peter to added you to my developer repo. Please let me know if this is not the case.

Before including these scripts into your work, please first make sure they work on your side. To this end I have created a simple example script single_test_nick.m for you to run (you need add input data files there). I only tested these scripts in Matlab 2017b: module load matlab/2017b.

A few more remarks:

CreRecombinase commented 5 years ago

I have added inst/m_code/negtwo_loglik.m and a "direct" transliteration to R inst/m_code/negtwo_loglik.R . I have added tests that my likelihood function is equivalent (except for the factor of 2 difference, see https://github.com/CreRecombinase/RSSp/blob/2cab5d5ac6f6fef2198edec5784de2d268ae5768/tests/testthat/test_dnorm.R#L3)

xiangzhu commented 5 years ago

@CreRecombinase Thanks for the update! Good to see two sets of codes written by us produce equivalent likelihood values (up to a constant 2).

However I am not sure if your tests above may complicate the package development or not.

I was suggesting a very simple consistency check in https://github.com/CreRecombinase/PolygenicRSS/issues/5:

  1. simulate data
  2. run your R codes on the simulated data
  3. run my Matlab codes on the same data
  4. compare output: likelihood, PVE estimates, etc

To perform this consistency check, it seems we don't need to add any additional test codes to your R package. (Perhaps you have other reasons for doing so?)

To keep it simple, if we can proof that two sets of codes produce almost identical results on the same data, then we can probably move on at this point?

In the long run we may consider including these codes for more rigorous testing.

xiangzhu commented 5 years ago

@CreRecombinase as we discussed yesterday, I have generated the following input and output data using my own implementations in Matlab to facilitate your consistency check.

Please let me know if you have any question. Thanks!

Note that I will be using this thread to document all data files that are generated for your tests.

All data files are saved at /project/compbio/RSSp/consistency_check_data/, to which you have full access.

2019-03-12

In this file I simulate vhat from Normal(0, 4^2) and dvec from Unif(0, 2), and I create 100 replicates. For each replicate, I fit RSS-P(0) model, and then record output obj (-2 * log likelihood), par (MLE) and pve (PVE estimate).

You can easily load these data in R as follows:

> suppressPackageStartupMessages(library(R.matlab))
> res_file <- "/project/compbio/RSSp/consistency_check_data/rand_seed_312_k_1_rep_100.mat"
> res_data <- R.matlab::readMat(res_file)
> names(res_data)
[1] "dvec" "obj"  "par"  "pve"  "vhat"
> dim(res_data$dvec)
[1] 10000   100
> dim(res_data$vhat)
[1] 10000   100
> dim(res_data$obj)
[1] 100   1
> dim(res_data$par)
[1] 100   1
> dim(res_data$pve)
[1] 100   1

When applying your R codes on dvec[,j] and vhat[,j], please do not adjust the row order.