greta-dev / greta

simple and scalable statistical modelling in R
https://greta-stats.org
Other
527 stars 63 forks source link

calculate() hangs with NaN parameters in sampling distribution in tf2 branch #617

Open goldingn opened 7 months ago

goldingn commented 7 months ago

This is causing me issues in a model using greta.dynamics - when the dynamics lead to a numerical overflow, the model hangs forever.

That model code is too convoluted to post here, but I've narrowed it down to a interaction between TFP's sampling, NaNs in tensors, and the tf$device context used in calculate. This may be a TensorFlow or TFP issue, but it would be great if we could find a solution or at least a workaround.

Here's a reprex using greta code with a custom op. Note the op itself is not the problem, it's just a way to relaiably create NaNs in underlying tensors. greta should ideally be robust to NaNs sneaking into tensorflow code:

library(greta)
#> 
#> Attaching package: 'greta'
#> The following objects are masked from 'package:stats':
#> 
#>     binomial, cov2cor, poisson
#> The following objects are masked from 'package:base':
#> 
#>     %*%, apply, backsolve, beta, chol2inv, colMeans, colSums, diag,
#>     eigen, forwardsolve, gamma, identity, rowMeans, rowSums, sweep,
#>     tapply

# the inverse incomplete gamma function (the major part of the quantile function
# of a gamma distribution)
igammainv <- function(a, p) {
  op <- greta::.internals$nodes$constructors$op
  op("igammainv", a, p,
     tf_operation = "tf_igammainv"
  )
}
tf_igammainv <- function(a, p) {
  tfp <- greta:::tfp
  tfp$math$igammainv(a, p)
}

# greta model, just using constants
lambda <- igammainv(.Machine$double.xmax, 0.9)
y <- poisson(lambda)
#> ℹ Initialising python and checking dependencies, this may take a moment.
#> ✔ Initialising python and checking dependencies ... done!
#> 

# this line hangs:
calculate(y)

Created on 2024-03-03 with reprex v2.0.2

Through some painful debugging (including a bisection search through the various levels of code called by calculate, with each negative borking my session), I've manage to create the following equivalent reprex using tensorflow code (note: loading greta first despite no greta code to make sure we have the same TF, TFP installations loaded):

library(greta)
#> 
#> Attaching package: 'greta'
#> The following objects are masked from 'package:stats':
#> 
#>     binomial, cov2cor, poisson
#> The following objects are masked from 'package:base':
#> 
#>     %*%, apply, backsolve, beta, chol2inv, colMeans, colSums, diag,
#>     eigen, forwardsolve, gamma, identity, rowMeans, rowSums, sweep,
#>     tapply
library(tensorflow)
tfp <- greta:::tfp

badfun <- function() {
  # this is just to create a tensor with NaN values
  badval <- tfp$math$igammainv(.Machine$double.xmax, 0.9)
  # pass it into the poisson sample method
  dist <- tfp$distributions$Poisson(rate = badval)
  dist$sample()
}

# this runs
badfun()
#> tf.Tensor(nan, shape=(), dtype=float32)

# this hangs
with(tf$device("CPU"), {
  badfun()
})

Created on 2024-03-03 with reprex v2.0.2

The tf$device stuff is used in the greta tf2-poke-tf-fun branch here: https://github.com/greta-dev/greta/blob/tf2-poke-tf-fun/R/calculate.R#L163 The rest of the TF code matches that called by the dag inside calculate

Here's my sessioninfo > devtools::session_info() ─ Session info ─────────────────────────────────────── setting value version R version 4.2.2 (2022-10-31) os macOS Ventura 13.1 system aarch64, darwin20 ui RStudio language (EN) collate en_AU.UTF-8 ctype en_AU.UTF-8 tz Australia/Perth date 2024-03-03 rstudio 2023.06.2+561 Mountain Hydrangea (desktop) pandoc 3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown) ─ Packages ─────────────────────────────────────────── package * version date (UTC) lib source base64enc 0.1-3 2015-07-28 [1] CRAN (R 4.2.0) cachem 1.0.7 2023-02-24 [1] CRAN (R 4.2.0) callr 3.7.3 2022-11-02 [1] CRAN (R 4.2.0) cli 3.6.1 2023-03-23 [1] CRAN (R 4.2.0) clipr 0.8.0 2022-02-22 [1] CRAN (R 4.2.0) coda 0.19-4 2020-09-30 [1] CRAN (R 4.2.0) codetools 0.2-18 2020-11-04 [1] CRAN (R 4.2.2) crayon 1.5.2 2022-09-29 [1] CRAN (R 4.2.0) devtools 2.4.5 2022-10-11 [1] CRAN (R 4.2.0) digest 0.6.31 2022-12-11 [1] CRAN (R 4.2.0) ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.2.0) evaluate 0.20 2023-01-17 [1] CRAN (R 4.2.0) fansi 1.0.4 2023-01-22 [1] CRAN (R 4.2.0) fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.2.0) fs 1.6.1 2023-02-06 [1] CRAN (R 4.2.0) future 1.33.0 2023-07-01 [1] CRAN (R 4.2.0) globals 0.16.2 2022-11-21 [1] CRAN (R 4.2.0) glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.0) greta * 0.4.3.9000 2023-11-13 [1] local here 1.0.1 2020-12-13 [1] CRAN (R 4.2.0) hms 1.1.2 2022-08-19 [1] CRAN (R 4.2.0) htmltools 0.5.4 2022-12-07 [1] CRAN (R 4.2.0) htmlwidgets 1.6.1 2023-01-07 [1] CRAN (R 4.2.0) httpuv 1.6.9 2023-02-14 [1] CRAN (R 4.2.0) jsonlite 1.8.4 2022-12-06 [1] CRAN (R 4.2.0) knitr 1.42 2023-01-25 [1] CRAN (R 4.2.0) later 1.3.0 2021-08-18 [1] CRAN (R 4.2.0) lattice 0.22-5 2023-10-24 [1] CRAN (R 4.2.0) lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.2.0) listenv 0.9.0 2022-12-16 [1] CRAN (R 4.2.0) magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.0) Matrix 1.6-1 2023-08-14 [1] CRAN (R 4.2.2) memoise 2.0.1 2021-11-26 [1] CRAN (R 4.2.0) mime 0.12 2021-09-28 [1] CRAN (R 4.2.0) miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.2.0) parallelly 1.34.0 2023-01-13 [1] CRAN (R 4.2.0) pillar 1.9.0 2023-03-22 [1] CRAN (R 4.2.0) pkgbuild 1.4.0 2022-11-27 [1] CRAN (R 4.2.0) pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.0) pkgload 1.3.2 2022-11-16 [1] CRAN (R 4.2.0) png 0.1-8 2022-11-29 [1] CRAN (R 4.2.0) prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.2.0) processx 3.8.0 2022-10-26 [1] CRAN (R 4.2.0) profvis 0.3.8 2023-05-02 [1] CRAN (R 4.2.0) progress 1.2.2 2019-05-16 [1] CRAN (R 4.2.0) promises 1.2.0.1 2021-02-11 [1] CRAN (R 4.2.0) ps 1.7.2 2022-10-26 [1] CRAN (R 4.2.0) purrr 1.0.1 2023-01-10 [1] CRAN (R 4.2.0) R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.0) Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.2.0) remotes 2.4.2 2021-11-30 [1] CRAN (R 4.2.0) reprex 2.0.2 2022-08-17 [1] CRAN (R 4.2.0) reticulate 1.28 2023-01-27 [1] CRAN (R 4.2.0) rlang 1.1.1 2023-04-28 [1] CRAN (R 4.2.0) rmarkdown 2.20 2023-01-19 [1] CRAN (R 4.2.0) rprojroot 2.0.3 2022-04-02 [1] CRAN (R 4.2.0) rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.2.0) sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.0) shiny 1.7.4 2022-12-15 [1] CRAN (R 4.2.0) stringi 1.7.12 2023-01-11 [1] CRAN (R 4.2.0) stringr 1.5.0 2022-12-02 [1] CRAN (R 4.2.0) tensorflow * 2.11.0 2022-12-19 [1] CRAN (R 4.2.0) tfruns 1.5.1 2022-09-05 [1] CRAN (R 4.2.0) tibble 3.2.1 2023-03-20 [1] CRAN (R 4.2.0) urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.2.0) usethis 2.1.6 2022-05-25 [1] CRAN (R 4.2.0) utf8 1.2.3 2023-01-31 [1] CRAN (R 4.2.0) vctrs 0.6.2 2023-04-19 [1] CRAN (R 4.2.0) whisker 0.4.1 2022-12-05 [1] CRAN (R 4.2.0) withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.0) xfun 0.37 2023-01-31 [1] CRAN (R 4.2.0) xtable 1.8-4 2019-04-21 [1] CRAN (R 4.2.0) yaml 2.3.7 2023-01-23 [1] CRAN (R 4.2.0) [1] /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library ─ Python configuration ─────────────────────────────── python: /Users/nick/Library/r-miniconda-arm64/envs/greta-env-tf2/bin/python libpython: /Users/nick/Library/r-miniconda-arm64/envs/greta-env-tf2/lib/libpython3.8.dylib pythonhome: /Users/nick/Library/r-miniconda-arm64/envs/greta-env-tf2:/Users/nick/Library/r-miniconda-arm64/envs/greta-env-tf2 version: 3.8.15 | packaged by conda-forge | (default, Nov 22 2022, 08:49:06) [Clang 14.0.6 ] numpy: /Users/nick/Library/r-miniconda-arm64/envs/greta-env-tf2/lib/python3.8/site-packages/numpy numpy_version: 1.23.2 tensorflow: /Users/nick/Library/r-miniconda-arm64/envs/greta-env-tf2/lib/python3.8/site-packages/tensorflow NOTE: Python version was forced by use_python function ──────────────────────────────────────────────────────
goldingn commented 7 months ago

As a workaround @njtierney is there any way to not set a device via compute_options in calculate?

goldingn commented 7 months ago

Never mind - I can workaround for now by forcing to compute on GPU! Should have thought before sending :)

It looks like TFP uses different Poisson sampling algorithms on GPU vs CPU. If we determine tis is only an issue in Poisson, we could manually encode the sampling method for the greta poisson distribution to one that is safer

njtierney commented 7 months ago

Ah glad to hear that forcing compute on GPU worked! But also that's a bit spooky that GPU vs CPU causes some strange behaviour.