tensorflow-cuda not working with greta

stephensrmmartin commented 5 years ago

System:

Arch linux 64bit
Nvidia 960 (4gb)
Nvidia driver: 410-66
Cuda version: 10.0.130
Using arch linux's tensorflow-opt-cuda (or tensorflow-cuda) for GPU support

Tested:

GPU support is working. Used deviceQuery test for detection; GPU is supported
Ran cifar10 example to benchmark GPU speeds; GPU support was working.

Greta model:

library(lavaan)
library(greta)
data(package='psych','bfi')
ds.a <- bfi[,1:5]
ds.a <- ds.a[complete.cases(ds.a),]
ds.a$A1 <- 7 - ds.a$A1
#ds.a <- ds.a[1:100,]
ds.a <- scale(ds.a)
ds.g <- as_data(ds.a)

N <- nrow(ds.a)
J <- ncol(ds.a)

# Latent CFA
theta <- normal(0,1,c(N,1))
nu <- normal(0,2,c(J,1))
lambda <- normal(0,1,c(J,1),truncation=c(0,Inf))
resid <- normal(0,2,c(J,1),truncation=c(0,Inf))

mu <- ones(N)%*%t(nu) + theta%*%t(lambda)
Sigma <- zeros(J,J)
diag(Sigma) <- resid

distribution(ds.g) <- multivariate_normal(mu,Sigma)

gretaMod <- model(lambda,nu,resid,theta)
gretaOut <- greta::mcmc(gretaMod,n_samples = 1000,warmup=1000,n_cores=1,chains = 1,one_by_one = FALSE)

I also tried with n_cores=4, chains=4; one_by_one=TRUE (I actually have to do this for this model due to some cholesky errors... which is odd, b/c that matrix is guaranteed positive definite, but I digress).

By default, compile=TRUE in model(). When that is TRUE, R/Rstudio will crash with a coredump very quickly. The error in the R session is: tensorflow/compiler/xla/service/gpu/ir_emitter_unnested.cc:688] Check failed: fusion->fusion_kind() == HloInstruction::FusionKind::kLoop (kOutput vs. kLoop)

No idea what that means, but on line 688 of that source file, there is indeed some equality check, and it fails.

If compile=FALSE, then the model seemingly runs fine (which makes sense, because XLA is not called).

compile=TRUE works fine when using CPU only; it appears to be a problem w/ gpu support. Ironically, compile=TRUE w/ cpu only is much faster than compile=FALSE with gpu support. The GPU /is/ being utilized, but I suppose compilation greatly speeds things up.

goldingn commented 5 years ago

Thanks for the detailed report!

When you ran the cifar10 model, was that calling tensorflow through R?

If not, it would be great if you could check that, to make sure the problem doesn't lie there, or R tensorflow seeing the wrong tensorflow installation.

It would be handy to know which versions of greta, tensorflow (and how you installed it for GPU) and tensorflow-probability you are using.

stephensrmmartin commented 5 years ago

I continued testing w/ it. If I instead use five separate normals rather than a R^5 multivariate normal (as above), compile=TRUE works, and the GPU is working. So it seems to have an issue with compiling a model with a multivariate likelihood, but not with five separate univariate likelihoods (which, in this case, is the same model).

Also unfortunate - The GPU is /way/ slower; I have an nvidia 960, but it's nearly 2-3x slower than just using the CPU.

goldingn commented 5 years ago

Hmm, OK that's interesting.

The only thing I can think that would be different is a cholesky decomposition of the covariance matrix. But I've definitely done a cholesky OP in greta on a GPU, with no trouble (and it was much faster than CPU).

I'm not surprised you don't get a speedup running the above on a GPU. There are no large linear algebra operations for the GPU to get it's teeth stuck into, so you're paying of send the data to the GPU, but for little gain. I've found GPUs to be hugely beneficial for e.g. matrix multiplications >1e5 rows, or cholesky decompositions on matrices of dimension >1e3. But not for smaller models than that.

njtierney commented 3 years ago

When I get the chance I will try and replicate this issue on an appropriate docker container or VM

greta-dev / greta

tensorflow-cuda not working with greta #246