Closed HaydenSchilling closed 1 year ago
How does the brm() call of the model look like?
The original model was:
b2 <- brm(Golden ~ s(days_since) + (1|SiteID) + (1|fDate) +
offset(log(Sampling_duration)),
cores=4,
family="zero_inflated_negbinomial",
data=mydata_wide,
iter = 5000)
A similar thing occurs for family = "negbinomial"
so I don't think it is an issue with the zero-inflation family.
I think it is a problem with the spline. In a project, collaborators of mine and I have had similar problems when analysing a model on local laptops that was originally fit in on a cluster. I have yet to find the origin of this problem, but I suspect it lies deep in some matrix algebra (perhaps in base R or deeper) that is somehow (I have no idea why) system dependent. I see that one of your machines is windows while the other one is linux. Is the latter machine a cluster?
Back then, we managed to solve the problem by using R 3.5 or 3.6 (I think) on the cluster instead of R 4.0.x, but we don't know why it helped.
You are correct, the linux machine is a cluster which originally fit the model. I have been looking at the results on my laptop and the cluster and noticed differences.
Do you suggest I attempt to redo the models with R 3.6 on the cluster? Assuming if the results then agree when looked at on both the laptop and cluster it's OK?
If you have responsive cluster admins, you may also ask them if they have seen something like that before, where presumably, different linear algebra setups imply different results.
I think iwe got it to work with R 3.5 back then but perhaps also versions later then R 4.0 work. I can only suggest to try out different settings, sorry. If results agree on both machines I think it is safe to say that the results are trustworthy.
Thank you, that sounds like a good way forward! I really appreciate the help. I'll talk to the cluster admins and if they have any insight post it here.
Could the following twitter thread provide the reason for the problem you encountered? https://twitter.com/dan_p_simpson/status/1571705560634105857
Interesting idea - maybe. I tried running the suggested export commands on the HPC before opening R and re-running but the predictions did not change. (full disclosure - I don't really understand what exactly the change in MKL does)
thank you for testing it out! still my assumption is some problems in linear algebra operations. some people in this thread on Twitter reported that the same deterministic operation performed multiple times may give different results on specific awkward circumstances. to me it looks as if something similar may happen in the here discussed case. perhaps someone else has ideas what exactly is happening?
HaydenSchilling @.***> schrieb am Sa., 24. Sept. 2022, 04:32:
Interesting idea - maybe. I tried running the suggested export commands on the HPC before opening R and re-running but the predictions did not change. (full disclosure - I don't really understand what exactly the change in MKL does)
— Reply to this email directly, view it on GitHub https://github.com/paul-buerkner/brms/issues/1386#issuecomment-1256838186, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCW2AGJZHI2U6JPSQKMFEDV7ZR4RANCNFSM55NEPZJQ . You are receiving this because you commented.Message ID: @.***>
The cause of this problem has now been isolate in #1465 although we still need to figure out how to actually fix it reliably.
@HaydenSchilling -- this issue may be due to a mismatch between your BLAS/LAPACK library implementations. From your session info's, it appears that Machine 1 is (probably) using the default, R-internal version but Machine 2 is using Intel's Math Kernel Library implementation. In issue #1465 I describe a similar problem for spline models which I was able to resolve by ensuring that either the default, R-internal libraries, or the same version of external libraries, are used in both training and prediction environments 👍
I will actually close this issue here to make the new isse the new hub for dicussions around this problem and hopefully its eventual fix. you can of course continue writing here. I just want to make sure to keep the issue tracker clean and avoid duplication.
Hi,
I've come across a difference in results on two machines running the same version of brms (2.17.0) and I'm wondering if you have any insight. This model is needed to run the code (apolagises for the size ~450mb).
If i run the script below on two different machines the results from posterior_epred are very different and I'm not sure which one is correct.
The session info for the two machines are: Machine 1
Machine 2
Thank you for any insight!