Open anddis opened 1 year ago
Update: with basehaz = "exp"
or basehaz = "weibull-aft"
I get identical results. between R 3.6.3
and 4.2.0
.
Were the functions for the Weibull distribution with PH parametrisation rewritten/modified at some point?
Hi @anddis - sorry to hear this, esp. if the lack of reproducibility is causing you a nuisance!
Looking at the commit history, I don't think we've pushed changes to the feature/survival
branch in about 9 months, so I doubt that it is related to any refactoring (although I'm not certain - e.g. not sure exactly the timeline on R 3.6.3 and 4.2.0).
Perhaps this sounds a little related to #556 that you reported (and nobody has time to investigate unfortunately!).
Similar to the other issue, the fact that weibull-aft
gives consistent results, but weibull
doesn't, does feel a bit worrying. Sad truth is that I'm not sure I can find the time to truly sit down and debug this atm unfortunately. I think we'd have to write a unit test to try identify the discrepancy across the different baseline hazards. Perhaps someone else involved with the repo here might be able to assist. 🤞 😞
Hi @anddis,
This issue came up in a recent discussion with @bgoodri, who mentioned that exact reproducibility isn't guaranteed by Rstan across R versions, even for the same data + model code + seed. There are many things that can change the results across versions of R.
Personally, if I need exact reproducibility, I package the R environment + code in a docker image ..
However, we've seen other reproducibility issues (e.g. #556 ) that may be at play here. It's hard for me to tell from your example if these are substantive differences, or if they may be considered in a category of chain-to-chain variability. Do you have thoughts on this, perhaps from non-reproducible examples?
Otherwise I think the next step would be to run your example here with more chains to see if the results are more similar on average.
Hello Sam and Jacki. Thank you for your replies.
@jburos: I agree that the ideal solution is a docker image. I also understand that exact reproducibility across R versions is not guaranteed. However, I'm surprised that results are in fact reproducible between R3.6.3 and R4.2.0 for some of the distributions/parametrisations but not for others (see my second post in this thread). But I know too little about the internal workings of stan_surv
and Rstan to be able to speculate on why this happens.
Issue https://github.com/stan-dev/rstanarm/issues/556#issue-1052153445 seems more concerning to me. The R version is the same, the only thing that changes is the order of the observations. Plus, as you can see from the plot in my second post (sorry for the low quality of the image), the posterior draws are identical up to iteration ≈500 and then they start to diverge. And again, results are identical regardless of the ordering of the observations for some distributions/parametrisations but not for others.
Thanks @anddis - I agree this is concerning, and I wanted thank you for the reproducible examples. It's on our radar ..
Yeah I agree the fact the behaviour differs by distribution is odd and worrying.
Although, I had missed the fact that the draws are identical for warmup and differ for post warmup. That feels really odd! I can't recall anything in stan_surv where we have differential behaviour for warmup vs post warmup, conditional on distribution - I thought the warmup vs post warmup behaviour was all just outsourced to rstan. But maybe I'm forgetting.
Summary:
Same
rstanarm
version fromfeature/survival
(version2.21.2
), same seed, different results between R3.6.3
and4.2.0
.Description:
As per the summary, I have the same version of
rstanarm
fromfeature/survival
installed on R version3.6.3
and4.2.0
. Even with the same seed, I get different draws form the posterior distribution.The 2 versions of
rstanarm
were not installed at the same time. Could it be that some changes to the code potentially related to this issue were introduced, yet the version ofrstanarm
was not updated? Otherwise, do you have any idea about what the different results could be due to?But above all: do you know I can obtain consistent results between R
3.6.3
and4.2.0
?Output:
(reproducible code below)
R
3.6.3
+rstanarm
2.21.2
R
4.2.0
+rstanarm
2.21.2
Reproducible Code:
RStanARM Version:
2.21.2
R Version:
3.6.3
and4.2.0
Operating System:
Mac OS 10.14.6