longhaiSK / HTLR

Bayesian Logistic Regression with Hyper-LASSO priors
https://longhaisk.github.io/HTLR
GNU General Public License v3.0
9 stars 1 forks source link

[BUG] Error: Non-positive probability of first tangent point (ghs/neg prior) #3

Open GabeAl opened 4 years ago

GabeAl commented 4 years ago

Describe the bug I have an ultra-high-dimensional sparse dataset (hundreds of thousands of features, hundreds of samples). It is a genetic dataset for copy number variation. Each column represents a genomic repeat, and each row is one subject. The outcome is a binary phenotype (0: absent, 1: present).

Hence, it is as a simple numeric matrix with features as columns and rows as samples. There are no missing values or negative values. Most of the values are 0, many are 1, and rarely some are higher (> 1).

I receive an error about the first tangent point having non-positive probability when prior="ghs" or prior="neg" (but not the default prior="t").

To Reproduce

X = data.matrix(read.table("tiny.txt",sep = '\t'))
y = read.table("tinyout.txt")[,1]
model = htlr(X,y,prior = 'ghs')

tiny.txt tinyout.txt

Note that the bug occurs (like in this toy example) even when N > p, so it is not a dimensionality problem. Is there anything particular about the data distribution requirements?

Expected behavior I was hoping for a more descriptive error message if the problem is my data, or for the model to finish without error otherwise.

R session info R 3.5.3, latest HTLR installed with devtools.

Screenshots

The best lambda chosen by CV: 0.05

Error in htlr_fit_helper(p = p, K = K, n = n, X = as.matrix(X_addint),  : 
  Error in adaptive rejection sampling:
the first tangent point doesn't have positive probability.
GabeAl commented 4 years ago

I have updated the issue with the exact code and tiny toy files needed to reproduce the bug on my system.

longhaiSK commented 4 years ago

This is a problem that we will look into. This may not be the problem of dimensionality.