Closed perrydv closed 5 months ago
@paciorek I think this is working and the only test failure is the car issue that propagated from devel to other branches. This wasn't a release priority but it could go in. Let's discuss how conservative we should be given everything else going into the imminent release. I haven't added tests to this new feature yet but that should be easy.
Also this is a feature that we turn off by default and not document. We know some users who would want to start using it.
@perrydv I'm happy to have this included. I'll let you do any additional work you want on it. Then just let me know it is ready to merge in. It might not matter, but probably best if you let me merge it in.
@perrydv This seems like hugely useful functionality, which I'm quite excited to see.
Do you have any code which demonstrates use? It doesn't have to do anything useful, but only be a proof of concept - including the name of the nimbleOption
which protects this.
Thanks @danielturek. The functionality is rudimentary. Here is an example:
library(nimble)
myDist <- nimbleFunction(
setup = function() {mean <- 1},
run = function(x = double(0), sd = double(), log = integer(0, default = 0)) {
return(dnorm(x, mean = mean, sd = sd, log = log))
returnType(double())
}
)
myDist1 <- myDist()
mc <- nimbleCode({
x ~ myDist1$run(2)
y ~ dpois(exp(x))
})
registerDistributions(list('myDist1$run' = list(BUGSdist = 'myDist1$run(sd)',
types = c('value = double()',
'sd = double()'))),
check = FALSE)
m <- nimbleModel(mc, debug = FALSE, data = list(y = 3), inits = list(x = .5))
cm <- compileNimble(m)
mcmc <- buildMCMC(m)
cmcmc <- compileNimble(mcmc, project = m)
samples <- runMCMC(cmcmc, niter = 5000)
Notice:
check=FALSE
. At least checkDistributionFunctions
would need updating to support automatic registration.Given that this is so rudimentary, but we are aware of use cases waiting for this, and we can't delay the release, here are some options:
allowNFinModels=FALSE
and tell early adopts how to turn it on.The first option would have the advantage that we could more dynamically update it for early adopters.
@perrydv If this is included in the release, my initial reaction is to keep the default as allowNFinModels=FALSE
for the time being.
I think there are some reasonable questions that should be addressed before turning this on by default:
d
versus r
functions)I also think it's a fair question whether the argument name allowNFinModels
is the final choice. Is this too implementation specific?
I'm on the fence about this. I guess if we know about use cases, it makes sense to put in, behind an option, and labelled experimental. We can always point initial users to branch if we want to make changes to it.
But particularly given release timeline, leaving it out also makes good sense.
@perrydv and I agreed that we not put this in 1.1.0.
I tend to agree, sounds good.
@perrydv @paciorek I was excited about this being added in the release, but I understand the hesitancy. I am happy to help test it and write up some use cases because I personally have some example code that I think this would make a lot cleaner by being able to cache some values and not repeat computation without it being a bunch of constants defined in the model. If that would be helpful let me know.
Hi @danielturek @paciorek @paul-vdb I have pushed updates to this functionality. Here is a summary (note that some changes affect our current system for distribution registration):
myNF$dcustom
or even simply myNF$d
. If an r
function is provided, it must be a method whose name corresponds to that of the d
function. If an r
function is not provided, one will be automatically generated (as currently done). Note however that it is not possible to generate a new method into an already-defined nimbleFunction. Therefore the generated r
function is a stand-alone (no setup code) nimbleFunction. In this example it would be rcustom_myNF_dummy
.r
functions (including for existing functionality for user-defined dists) now have the suffix _dummy
or __dummy
added to their name. If we don't like having this change for existing functionality, I can revert it easily. For the new functionality, the name is more clearly artificial anyway, so the new suffix seems reasonable.registerDistributions
. This was a moderately substantial change.test-user
could not be run twice in the same R session due to a missing deregisterDistributions
, which resulting in a piece of litter. I cleaned that up.allowNFinModel
. Some changes will affect processing even if that option is FALSE
. A common need was some version of changing as.character(code)
to safeDeparse(code)
where code
is the foo
of foo(x)
, which could now be myNF$foo(x)
, necessitating a full deparse of myNF$foo
since it is not guaranteed to be just a name.test-user
(clearly marked) to walk through the new functionality.allowNFinModel
.This is now passing tests. Some of the test failures arose because existing tests needed updating. Some were from the distribution name swapping that comes up (e.g. dbin and dbinom) that I've ironed out. Also I made it work with T()
when the p
and q
functions are provided in the nimbleFunction. I think is ready for any code review.
Nice!
My review so far is just looking at the test suite.
r
, our message (here specifically for one of the test cases) says "NIMBLE is generating a placeholder function, r2", when the name of the actual dummy in this case is r2_myDist1_dummy
. r
functions with _dummy
we should think about whether to do that for cases not involving nimbleFunctions in models. Given we haven't been doing it, my slight preference is not to do it for this new functionality.run
method as the d
function, right? Folks might be inclined to do: x ~ dmynorm$run()
.d
method starting with "d". E.g., if in the test starting on line 605, we name the method foo
instead of dmynorm
, this is the error:
[Note] Registering 'myDist1$foo' as a distribution based on its use in BUGS code. If you make changes to the nimbleFunctions for the distribution, you must call 'deregisterDistributions' before using the distribution in BUGS code for those changes to take effect.
Error in FUN(X[[i]], ...) : checkDistributionFunctions: expecting 'n' as the first argument for the simulation function for foo.
deregisterDistributions
don't check that the distribution has been removed from our set of user-defined dists. We may want to add an expectation for that.This doesn't feel like the sort of functionality that will silently give wrong results, but if there are parts of the code you want us to look at, let us know.
For the manual, my thought is it will suffice to show a basic example with just a "d" function and then a fairly full example like that in test-user.R
line 807, but perhaps adding an rbar
method and illustrating the idea of caching big objects via setup code rather than having them in the model).
Hi @paciorek I'm catching up on this and realizing I didn't catch your comments in March. Numbers will match your points:
dummy
suffix.So I think the dummy
is the one design question, and I guess I should just not do that. And then it looks like I didn't add make user manual changes, per your last comment.
Regarding #2 I just am referring to the fact that with standard user-defined distributions we often generate "dummy" "r" functions in the global env't, but they don't have "dummy" in the name. In any event it sounds like I convinced you to remove "dummy".
For #7, something simple like using expect_silent
or doing something basic and checking the result may be sufficient.
@perrydv I spent yesterday updating some code that has a lot of repeated computation in the distribution function. Before I was doing some messy saving of values within model code and passing these matrices to the distribution function. Now with the setup code I can do that all internally and it's way cleaner so huge bonus. The thing I had to think about a lot while doing it was the implication on the model graph when I have some functions that are caching a value internally that might get used elsewhere. Here is a very small example expanded from yours above of where things can go terribly wrong (if someone did something very silly). I don't necessarily think this is a flaw, but something that we should warn that Nimble doesn't detect additional dependencies within the nimble function code with setup. Not sure if there is documentation already written for how to use this.
library(nimble)
myDist <- nimbleFunction(
setup = function() {mean <- 1},
run = function(){},
methods = list(
dfn = function(x = double(0), sd = double(), log = integer(0, default = 0)) {
return(dnorm(x, mean = mean, sd = sd, log = log))
returnType(double())
},
rfn = function(n = integer(), sd = double()){
return(rnorm(1,mean, sd))
returnType(double())
},
cacheNewMean = function(mu = double()){
mean <<- mu
}
)
)
myDist1 <- myDist()
mc <- nimbleCode({
mu ~ dnorm(0, 1)
myDist1$cacheNewMean(mu)
x ~ myDist1$dfn(2)
y ~ dpois(exp(x))
})
registerDistributions(list('myDist1$dfn' = list(BUGSdist = 'myDist1$dfn(sd)',
types = c('value = double()',
'sd = double()'))))
m <- nimbleModel(mc, debug = FALSE, data = list(y = 3), inits = list(x = .5))
m$getDependencies("mu")
@perrydv if you could finish off this PR soon, that would be helpful, and allow us to merge in #1414 too.
@paciorek I will do this. @paul-vdb I am not following your example. myDist1$cacheNewMean(mu)
is not a declaration and so shouldn't be expected to do anything. It should be error-trapped but it is possible that an unrelated issue prevented that from happening (and should be fixed now or soon). But the overarching point of drawing attention to implications for the model graph and dependencies makes sense and I will try to do that.
@paciorek @danielturek @paul-vdb I've made the following changes:
TRUE
.expect_*
statements to tests (and some new tests), especially for results of deregisterDistributions
.I think this should be now very close or ready to merge.
@perrydv
_dummy
suffixes. Respectfully, I don't see the point, and it feels like it infringes on a grey area. obj$ddist
can just as well auto-generate rdist_obj
without any problems, right?I agree that we should remove _dummy
.
Happy to see this merged in regardless.
Ok I've just pushed a version with "_dummy" entirely removed from auto-generated r function names.
@perrydv I see a merge conflict here. Just flagging it for you.
@paciorek The resolution was to remove the devel parts of the merge conflict and accept this branch's changes. I'm not clear why git got confused (unless I'm confused). If this passes testing (but it doesn't have the dmvt AD test fixed), merge if you are ready and then I can bring the changes from devel into #1414 and make the final tweak we need there.
@perrydv this is now merged. Back to you for #1414 .
Do not merge.
This is something I sketched out a few months ago and am throwing into testing now to see what breaks.
These changes allow a user-defined distribution or function to be an object of a nimbleFunction class with setup code. This would allow embedding input data into that object in cases where it doesn't really need to be in the model. This may need clean-up and completion. I am just moving it aside for now and want to see testing.