stephenslab / rss

RSS: Regression with Summary Statistics
https://stephenslab.github.io/rss
MIT License
50 stars 11 forks source link

Octave compability/support #1

Closed gwern closed 8 years ago

gwern commented 8 years ago

I can't afford Matlab, so I was hoping to run this in Octave. I don't know much about either, but it seems nontrivial to get example1.m up and working in Octave.

After a couple of hours, I got most of the example working with the following edits:

  1. in example1.m, replace matfile with load and delete the two progess_bar calls
  2. src/rss_bvsr.m, delete progress_bar calls; I don't know where invnormcdf comes from but Octave doesn't have it, so dump that in at the end:

    +function x = invnormcdf(p)
    +%INVNORMCDF(P)  Normal quantile function
    +% X = INVNORMCDF(P) returns the P-th quantile of the standard normal distribution.
    +% In other words, it returns X such that P = NORMCDF(X).
    +  x = erfinv(2*p-1)*sqrt(2);
    +end
  3. in src/calc_posterior_bvsr.m, the solve_tril/solve_triu calls go to the lightspeed library, but don't seem to be anything but a optimized / operator (and I'm not sure the lightspeed dep is even worth having since I didn't notice much difference in runtimes with/without and it's an old proprietary library), so it would just read

    +                T   = L\RSBSMz
    +                muz = U\T
    ...
    +                betapost_err = U\randn(n_gamma,1)
  4. in propose_gamma.m, Octave doesn't have containers but it does have structs, which are string-indexed maps, so replacing is fairly straightforward:

    diff --git a/src/propose_gamma.m b/src/propose_gamma.m
    index c31de0c..804b7b7 100644
    --- a/src/propose_gamma.m
    +++ b/src/propose_gamma.m
    @@ -17,13 +17,14 @@ function [rank_new, gamma_logratio] = propose_gamma(rank_old, p_gamma, repeat)
        rank_new    = rank_old;
    
        % create a map object for snps in the current model
    -   snpmap = containers.Map('KeyType', 'double','ValueType', 'uint8');
    +    %% snpmap = containers.Map('KeyType', 'double','ValueType', 'uint8');
    +    snpmap = struct()
    
        % initialize it if ngamma_old is not zero
        if ngamma_old ~= 0
            k   = rank_old;                 % keys: the rank of snps, double
            v   = ones(1, ngamma_old, 'uint8');     % mapped values: 1 (unit8)
    -       snpmap  = containers.Map(k, v);     
    +        snpmap.( num2str(k)) = v; %% containers.Map(k, v);
        end
    
        for i = 1:repeat
    @@ -31,7 +32,7 @@ function [rank_new, gamma_logratio] = propose_gamma(rank_old, p_gamma, repeat)
            % rare case 1: no snp in the current model; must add one snp
            if ngamma_new == 0
                r_add       = sample(p_gamma);
    -           snpmap(r_add)   = 1;
    +            snpmap.( num2str(r_add))    = 1;
                rank_new    = r_add;
                ngamma_new  = ngamma_new + 1;
                gamma_logratio  = gamma_logratio - log(p_gamma(r_add)); % compute the contribution to log MH ratio
    @@ -42,7 +43,7 @@ function [rank_new, gamma_logratio] = propose_gamma(rank_old, p_gamma, repeat)
            if ngamma_new == p
                col_id      = unidrnd(ngamma_new);          % uniformly pick one snp to remove
                r_remove    = rank_new(col_id);
    -           remove(snpmap, r_remove);
    +            rmfield( num2str(snpmap), r_remove);
                rank_new(col_id) = [];                  % remove the snp
                gamma_logratio   = gamma_logratio + log(ngamma_new);    % compute the contribution to log MH ratio
                ngamma_new   = ngamma_new - 1;
    @@ -56,13 +57,13 @@ function [rank_new, gamma_logratio] = propose_gamma(rank_old, p_gamma, repeat)
                % randomly pick a snp until the picked snp is NOT in the current model
                while true
                    r_add = sample(p_gamma);
    -               if (isKey(snpmap, r_add) == 0); break; end
    +                if (isfield(snpmap,  num2str(r_add)) == 0); break; end
                end
                % calculate Pr(picking a snp NOT in the current model)
                prob_total = 1;
                prob_total = prob_total - sum(p_gamma(rank_new));
                % add the proposed snp
    -           snpmap(r_add)   = 1;
    +            snpmap.( num2str(r_add))    = 1;
                rank_new    = [rank_new r_add];
                ngamma_new  = ngamma_new + 1;
                % calculate the contribution to log MH ratio
    @@ -77,7 +78,7 @@ function [rank_new, gamma_logratio] = propose_gamma(rank_old, p_gamma, repeat)
                prob_total = prob_total - sum(p_gamma(rank_new));
                prob_total = prob_total + p_gamma(r_remove); % b/c this snp is NOT included in the proposed model
                % remove the proposed snp
    -           remove(snpmap, r_remove);
    +            rmfield(snpmap, num2str( r_remove));
                rank_new(col_id) = [];
                % calculate the contribution to log MH ratio
                gamma_logratio  = gamma_logratio + log(p_gamma(r_remove) / prob_total) + log(ngamma_new);
    @@ -88,7 +89,7 @@ function [rank_new, gamma_logratio] = propose_gamma(rank_old, p_gamma, repeat)
                r_remove = rank_new(col_id);
                while true
                    r_add = sample(p_gamma); % randomly pick a snp based on marginal association rank
    -               if (isKey(snpmap, r_add) == 0); break; end % pick a snp NOT in the model to add
    +                if (isfield(snpmap,  num2str(r_add)) == 0); break; end % pick a snp NOT in the model to add
                end
                % calculate the probability of picking a snp NOT in the current model
                prob_total  = 1;
    @@ -96,8 +97,8 @@ function [rank_new, gamma_logratio] = propose_gamma(rank_old, p_gamma, repeat)
                gamma_logratio  = gamma_logratio + log( p_gamma(r_remove) / (prob_total + p_gamma(r_remove) - p_gamma(r_add) ) );
                gamma_logratio  = gamma_logratio - log( p_gamma(r_add) / prob_total );
                % flip the indicators of these two snps
    -           remove(snpmap, r_remove);
    -           snpmap(r_add)    = 1;
    +            rmfield(snpmap,  num2str(r_remove));
    +            snpmap.( num2str(r_add))     = 1;
                rank_new(col_id) = [];
                rank_new     = [rank_new r_add];
            end

Unfortunately, I then hit a problem I couldn't figure out how to solve in the uses of sample which seems to be https://www.mathworks.com/help/sldo/ref/sdo.sample.html but I couldn't find any Octave equivalents googling. So I had to stop there.

rss-octave.diff.txt

But it would be nice if rss could be made to work with Octave so everyone can run it.

xiangzhu commented 8 years ago

@gwern Thanks for your time on trying rss on Octave. Here are my replies.

in example1.m, replace matfile with load and delete the two progess_bar calls

progress_bar is a just a tool to viz the progress of MCMC, so removing it does not affect the results.

... I don't know where invnormcdf comes from but Octave doesn't have it ...

invnormcdf is a sub-routine in lightspeed library, which computes the normal quantiles.

... I couldn't figure out how to solve in the uses of sample ...

sample is not a Matlab built-in function; it is provided by lightspeed, which generates random numbers from categorical distribution.

I agree that the implementation of rss based on commercial language (e.g. Matlab) could be a barrier. However, I do not plan to convert rss to Octave in the near future. Instead, I am considering an open-source version of rss that is coded either in R or C++. I will keep you posted.

Thanks again for your time and interest in our methods! Feel free to contact me if you have any other questions.

gwern commented 8 years ago

removing it does not affect the results.

Right, that's why I felt deleting it was safe.

sample is not a Matlab built-in function; it is provided by lightspeed, which generates random numbers from categorical distribution.

Ah. I got that one wrong. Generating from a categorical doesn't sound too hard to clone.

Instead, I am considering an open-source version of rss that is coded either in R or C++.

I use R mostly, so I'd be pleased to see an R library I could use to get a full R workflow: load in a public polygenic score dataset, do rss on it to get posteriors, model stuff with those posteriors (eg with BCEA) or pass them into JAGS for further processing etc. (But C++ might have other advantages, so don't let my preferences stop you.)

In the mean time, I think I'll look into getting my hands on Matlab somehow. Apparently the Machine Learning Coursera comes with a free trial I might be able to use.

xiangzhu commented 7 years ago

Hi Gwern @gwern

I hope all is well. As I mentioned in a previous reply, we are working on an R package development for RSS-related methods (rssr), which is currently led by my colleague Nick (@CreRecombinase). You might be interested in a pre-release of rssrpackage.

Specifically, please go to https://github.com/stephenslab/rssr, and do the following:

install.packages('devtools')
devtools::install_github("stephenslab/rssr",ref="v0.1.1-alpha",build_vignettes=TRUE)
vignette("example")

The vignette shows that rssr::rss_varbvsr_naive and rss_varbvsr.m produce the same results. Hence, you can instead use rssr::rss_varbvsr_naive in R if you are interested in the algorithms behind rss_varbvsr.m but cannot get it installed in Matlab.

Please note that this is only a pre-release. A lot of work should be done before we can safely call it a "release". For example, we only consider one model (RSS-BVSR) and one algorithm (variational approximation) in this pre-release. In addition, the documentation is lacking. Please feel free to contact me if you have any relevant question.

We only make the basic algorithms publicly available at this point. We developed several more advanced variants (both in Matlab and R) for our large-scale analyses, and we will make them public as soon as we submit our manuscripts.

pcarbo commented 7 years ago

Hi Gwern, if you have difficulty installing the rssr package, please refer to Issue #5 which may help.