Can't fit generalized Pareto distribution because all tail values are the same.

jean997 / cause

R package for CAUSE

https://jean997.github.io/cause/

52 stars 15 forks source link

Can't fit generalized Pareto distribution because all tail values are the same. #30

Closed yfeiiii closed 1 year ago

yfeiiii commented 2 years ago

Hi, I got warnings when fitting results. Although I got 50 warning messages, all were: "Can't fit generalized Pareto distribution because all tail values are the same."

I still could get summary(res) ,which was significant.

p-value testing that causal model is a better fit: 0.007 Posterior medians and 95 % credible intervals: model gamma eta q [1,] "Sharing" NA "0 (-0.12, 0.12)" "0.07 (0, 0.31)" [2,] "Causal" "0 (-0.12, 0.12)" "0 (-0.12, 0.12)" "0.07 (0, 0.31)"

elpd were:

model1 model2 delta_elpd se_delta_elpd z p 1 null sharing 5.2e-13 3.4e-14 15.0 1.000 2 null causal 4.7e-13 3.4e-14 14.0 1.000 3 sharing causal -4.2e-14 1.7e-14 -2.5 0.007

And I tried

res$loos[[2]]

Does this mean the significant results were not reliable?

Thanks!

jean997 commented 2 years ago

Hello! I have occasionally seen warnings for the Pareto k diagnostic values but it is suspicious to me that 84% are in the very bad category. It seems like something may be odd with your data. I've never seen the "tail values are the same" warning which I think may also point to something in your data. Can you give me more information about the source of the data and how you created it?

yfeiiii commented 2 years ago

Hello! I have occasionally seen warnings for the Pareto k diagnostic values but it is suspicious to me that 84% are in the very bad category. It seems like something may be odd with your data. I've never seen the "tail values are the same" warning which I think may also point to something in your data. Can you give me more information about the source of the data and how you created it?

Hi,

I guess I found the reason might be there is not enough statistical power. Thank you

jean997 commented 2 years ago

I would be surprised if that was the explanation -- at least I have not seen this when I did tests with simulated low power GWAS. How many variants did you use for the parameter calculation step and how many did you use for the posterior fitting step?

yfeiiii commented 1 year ago

Sorry for the late reply. After several times of trying, LD clumping step I used for the CAUSE ranging from 24 to 1921.(using r2= 0.01, kb = 1000kb, value = 5e-5), same as I used for other MR methods, i.e., MBE, MR-EGGER. As most of the models were insignificant in CAUSE while significant for MBE and MR-EGGER, I am wondering how many variants are enough for CAUSE analysis.

Thank you so much!

jean997 commented 1 year ago

I usually recommend to use a genome-wide set of LD-pruned SNPs for parameter estimation and down to a p-value of 1e-3 for model fitting. The model fit should not be too sensitive to the p-value threshold. If you are finding that it is, that could indicate something unusual.