Not able to read models in loop

bbbales2 commented 4 years ago

Code is:

for(i in 1:length(pn)) {
  tryCatch({
    print(paste0("Saving model ", i))
    po = posterior(pn[i], pdb = pd)
    writeLines(stan_code(po), paste0("/home/bbales2/cmdstan-mpi/", pn[i], ".stan"))
    data = get_data(po)
    stan_rdump(names(data), paste0("/home/bbales2/cmdstan-mpi/", pn[i], ".data.R"), env = list2env(data))
  }, error = function(e) {
    print(e)
  })
}

Errors are:

[1] "Saving model 1"
[1] "Saving model 2"
[1] "Saving model 3"
[1] "Saving model 4"
[1] "Saving model 5"
[1] "Saving model 6"
[1] "Saving model 7"
[1] "Saving model 8"
[1] "Saving model 9"
[1] "Saving model 10"
[1] "Saving model 11"
[1] "Saving model 12"
[1] "Saving model 13"
<simpleError in parse_con(txt, bigint_as_char): parse error: premature EOF

                     (right here) ------^
>
[1] "Saving model 14"
[1] "Saving model 15"
<github_error in gh::gh(github_path(pdb, type = "contents", path = from), .token = pat): GitHub API error (403): 403 Forbidden
  API rate limit exceeded for 128.59.108.58. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)
>
[1] "Saving model 16"
<github_error in gh::gh(github_path(pdb, type = "contents", path = from), .token = pat): GitHub API error (403): 403 Forbidden
  API rate limit exceeded for 128.59.108.58. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)
>
[1] "Saving model 17"
<github_error in gh::gh(github_path(pdb, type = "contents", path = from), .token = pat): GitHub API error (403): 403 Forbidden
  API rate limit exceeded for 128.59.108.58. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)
>
[1] "Saving model 18"
<github_error in gh::gh(github_path(pdb, type = "contents", path = from), .token = pat): GitHub API error (403): 403 Forbidden
  API rate limit exceeded for 128.59.108.58. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)
>
[1] "Saving model 19"
<github_error in gh::gh(github_path(pdb, type = "contents", path = from), .token = pat): GitHub API error (403): 403 Forbidden
  API rate limit exceeded for 128.59.108.58. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)
>
[1] "Saving model 20"
<github_error in gh::gh(github_path(pdb, type = "contents", path = from), .token = pat): GitHub API error (403): 403 Forbidden
  API rate limit exceeded for 128.59.108.58. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)
>
[1] "Saving model 21"
<github_error in gh::gh(github_path(pdb, type = "contents", path = from), .token = pat): GitHub API error (403): 403 Forbidden
  API rate limit exceeded for 128.59.108.58. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)
>
[1] "Saving model 22"
<github_error in gh::gh(github_path(pdb, type = "contents", path = from), .token = pat): GitHub API error (403): 403 Forbidden
  API rate limit exceeded for 128.59.108.58. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)
>
[1] "Saving model 23"
<github_error in gh::gh(github_path(pdb, type = "contents", path = from), .token = pat): GitHub API error (403): 403 Forbidden
  API rate limit exceeded for 128.59.108.58. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)
>
[1] "Saving model 24"
<github_error in gh::gh(github_path(pdb, type = "contents", path = from), .token = pat): GitHub API error (403): 403 Forbidden
  API rate limit exceeded for 128.59.108.58. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)
>
[1] "Saving model 25"
<github_error in gh::gh(github_path(pdb, type = "contents", path = from), .token = pat): GitHub API error (403): 403 Forbidden
  API rate limit exceeded for 128.59.108.58. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)
>
[1] "Saving model 26"
<github_error in gh::gh(github_path(pdb, type = "contents", path = from), .token = pat): GitHub API error (403): 403 Forbidden
  API rate limit exceeded for 128.59.108.58. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)
>
[1] "Saving model 27"
<github_error in gh::gh(github_path(pdb, type = "contents", path = from), .token = pat): GitHub API error (403): 403 Forbidden
  API rate limit exceeded for 128.59.108.58. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)
>
[1] "Saving model 28"
<github_error in gh::gh(github_path(pdb, type = "contents", path = from), .token = pat): GitHub API error (403): 403 Forbidden
  API rate limit exceeded for 128.59.108.58. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)
>

MansMeg commented 4 years ago

Thanks Ben for submitting this!

So I think model 13 is the gp_pois problem. The rest of the problems concern that you do multiple calls to the github API (under the hood). So if you supply the github pat to your pd object those should go away (then you can process roughly 2000 posteriors per hour). Hopefully we can put the database on a REST API so we dont need to go through github.

eerolinna commented 4 years ago

Would it be good to make the error message from missing token more explicit or is the current one good enough?

A more explicit error message could be something like

<posteriordb_error in get_data(po): Out of GitHub API calls

To get more API calls, add GITHUB_PAT to your R environment variables
More instructions here: <link>

This is just a first draft, feel free to improve it

It's also possible to use a local posterior database if you need to access more than 2000 posteriors a hour. In this case you would git clone the repository and provide a path to it when creating a posterior database object (having to do this isn't really ideal but it's a workaround). Here's some documentation for it, I also show the relevant bits below https://github.com/MansMeg/posteriordb/blob/master/rpackage/README.md#connect-to-the-posterior-database

here we can use the database locally (if the repo is cloned).
my_pdb <- pdb_local()
The above code requires that your working directory is in the main folder of the cloned repository. Otherwise we can use the path argument.

Maybe it would be good to add an example that uses the path argument to the documentation?

bbbales2 commented 4 years ago

It's also possible to use a local posterior database

Why not just always use a local posterior database? Do we expect it to get too large for package managers to hold?

eerolinna commented 4 years ago

So including the local posterior db as part of the R/python package so it would automatically be available after installing the package? There was some discussion about that in https://github.com/MansMeg/posteriordb/issues/41#issuecomment-527881619 but we thought the remote db was more suitable. The size growing was not mentioned there but it could become an issue with a bundled database.

MansMeg commented 4 years ago

Not too large, but if one is interested in only data and stan code, you maybe don't want 100mb of reference posteriors. =)

MansMeg commented 4 years ago

Now fixed with aliases.

stan-dev / posteriordb

Not able to read models in loop #138