kingaa / pomp

R package for statistical inference using partially observed Markov processes
https://kingaa.github.io/pomp
GNU General Public License v3.0
110 stars 26 forks source link

On the `give_log` flag in the `dmeasure` component #119

Closed bogaotory closed 3 years ago

bogaotory commented 3 years ago

https://github.com/kingaa/pomp/blob/467720ba63418590a5a96751cf51acf312f58d47/vignettes/getting_started.Rmd#L731-L737

My (A newbie's) understanding of the give_log (log in the case of R-defined dmeasure function) parameter is that essentially the dmeasure function needs to have a structure that looks something like this:

likelihood = ...;
log_likelihood = ...;
if (give_log) {
    return log_likelihood;
} else {
    return likelihood;
}

Is this correct? Could you also point me to the right direction to learn bout where the give_log parameter/flag fit inside the algorithm please? As in when does the algorithm decide to evaluate "log likelihood" instead of "likelihood" and vice versa? (It always seem to be FALSE when I ran my dmeasure with mif2). Thanks very much.

kingaa commented 3 years ago

You have the right idea. Of course, you would not bother to evaluate both likelihood and log likelihood first if you only wanted one of these. Also, the dmeasure function does not return a value, so the code would look more like:

if (give_log) {
  lik = <compute log_likelihood>;
} else {
  lik = <compute likelihood>;
}

With respect to your second request, I'm not sure exactly what you want to know, but here are a couple of reflections.

  1. Some algorithms may want the likelihood, others the log likelihood. Importantly, log and exp are imperfect maps between the two spaces because, on a finite machine, they lose precision. If they were perfect maps, it would clearly be sufficient to ask the user to provide only one.
  2. Algorithms for computing likelihoods and log likelihoods may differ. In particular, there are situations, for example in the tails of many distributions, when there exists a more precise algorithm for computing the log likelihood than simply computing the likelihood and then taking the log (or vice versa). For this reason, all built-in R ddist functions have a give_log parameter. pomp merely recapitulates this behavior, for the same reason.
bogaotory commented 3 years ago

Thanks for the prompt reply.

I guess my question is when do pomp's algorithms such as mif2 decide to calculate the log likelihood instead of likelihood and vice versa? The user implement dmeasure such that it may return log likelihood when give_log is TRUE, and return likelihood when give_log is FALSE. But the dmeasure function is called by algorithms such as mif2, not by the user. So when dmeasure is called, I guess pomp needs to make a decision as to what the value of give_log is? And how is that decision made?

kingaa commented 3 years ago

Yes, pomp makes that decision. The decision depends on the algorithm. The user must provide for both cases. To figure out when it is called with what value, you can always refer to the source (e.g., here). But the point is that the user doesn't need to worry about that. pomp is designed to be more than a set of data analysis tools: it is also a platform for the implementation of inference algorithms. If there is a chance that a future algorithm would want a different choice than an existing algorithm, both possibilities must be allowed for. I suppose one could even imagine an algorithm that made the choice at runtime or even changed the choice adaptively. At the moment, however, each pomp algorithm has its choice hard-coded.

Permit me to ask, why are these matters of interest to you?

bogaotory commented 3 years ago

Thanks for pointing me to the right place in the source code. I was looking at dmeasure.c for the answer because I thought there might be a function in there to control the decision at runtime.

Of course, happy to share my thoughts. Principally, it is of interest because I wanted to know exactly how everything works, and to make sure that there is no misunderstandings at the interface between my model implementation and pomp.

More specifically, the paragraph on dmeasure was a little confusing to read if I may say so. As in I get the messages after reading it, but I am not quite sure if you mean what I thought you meant, if that makes sense. Since you emphasised "This is one of the most common places where newbies (like me) make mistakes", I didn't want to take any chances, and thought I'd better get some confirmation/clarification if possible.

As I started reading this section on dmeasure, after reading the code, this give_log variable seemed to have came out of nowhere. The standard dpois function, as you referenced, has a parameter log, but why is it called give_log here I started to wonder. Could it be a typo or is it a versioning issue? I started to have doubts from here on. If you'd wrote something like lik = dpois(pop,b*N,log=give_log); it would make more sense, but that's not really a valid C syntax I suppose.

Also a different way to interpret the first half of this paragraph, as I did for a long moment, was that you were saying that it is the user's duty to write either lik = dpois(pop,b*N,0); or lik = dpois(pop,b*N,1); . The second half of the paragraph contradicts this assumption, but it also did not clarify what the "needs" are as in "... according to its needs." So now I have doubts for both halves of the paragraph, and can't decide which to go with.

If I may make a couple of suggestions, it would be to 1. declare right after the code that give_log is a variable local to pomp. (You've made it quite clear why Csnippet is the preferred method at the beginning of the tutorial, but I have to admit I get headaches reading R's syntax and it is made even worse when Csnippets are involved. The biggest headache I have is I simply don't know where everything is defined and what their scopes are.) 2. clarify "...according to its needs" with an example such as mif2 always uses log likelihood because ... (well I don't really know the reason :) ).

Thanks very much for your help again.

kingaa commented 3 years ago

Thanks for your suggestions, @bogaotory. I will spend some time with the relevant paragraph in the "Getting Started" document. More generally, figuring out how to communicate the demands that are placed on the user without burdening him or her with extraneous information has been a challenge. Clearly, there's room for improvement.

The standard dpois function, as you referenced, has a parameter log, but why is it called give_log here I started to wonder. Could it be a typo or is it a versioning issue? I started to have doubts from here on. If you'd wrote something like lik = dpois(pop,b*N,log=give_log); it would make more sense, but that's not really a valid C syntax I suppose.

Yes, it's unfortunate, but log is defined in the standard C math library, so it can't be used as a variable name in C snippets. If its any consolation, the same convention is found in the R source code (e.g., dpois).