stan-dev / stan

Stan development repository. The master branch contains the current release. The develop branch contains the latest stable development. See the Developer Process Wiki for details.
https://mc-stan.org
BSD 3-Clause "New" or "Revised" License
2.61k stars 369 forks source link

next manual, 2.17 #2336

Closed bob-carpenter closed 7 years ago

bob-carpenter commented 7 years ago

Summary:

This is the issue for suggesting fixes for the Stan manual. Please just add suggestions as comments rather than opening new issues.

Current Version:

v2.16.0

bob-carpenter commented 7 years ago

HMM Viterbi example problem in examples.tex.

Originally reported in https://github.com/stan-dev/example-models/issues/31#issuecomment-309627744

luisdamiano commented 7 years ago

programming.tex line 185 of the reference manual v2.16.0 I believe it only makes sense if "kappa theta" becomes "kappa phi" instead, especially since alpha is a k-sized vector.

aaronjg commented 7 years ago

in the GP section "In this case, an inverse gamma, inv_gamma_lpdf in Stan’s language, will work well as it has a sharp left tail that puts negligible mass on length-scales, but a generous right tail, allowing for large length-scales."

needs an adjective to describe length-scales in the first clause. Perhaps: "In this case, an inverse gamma, inv_gamma_lpdf in Stan’s language, will work well as it has a sharp left tail that puts negligible mass on infinitesimal length-scales, but a generous right tail, allowing for large length-scales."

bob-carpenter commented 7 years ago

OK, I'll fix that. I'll mention that gamma, inverse gamma and lognormal are all "zero avoiding" in the sense that the limit of the density at zero is zero and estimates near zero will be pushed away. Then I'll cross-reference the discussion in the regression chapter, which cites Andrew's and Vince's papers.

bob-carpenter commented 7 years ago
aaronjg commented 7 years ago

In the GP section on page 247 and 255, the example code multiplies by 1/2. It seems like this would just round to 1 given integer division. Everywhere else in the manual uses multiplication by 0.5 instead.

aaronjg commented 7 years ago

Also in the GP section, I think that rho ~ gamma(4,4) should be rho ~ inv_gamma(4,4). The text refers to the benefits of the inverse gamma distribution, so the example code should use that as well.

I think there is a similar issue with the description of the generalized inverse gaussian. The manual says that the GIG has a Gaussian right tail, but actually it has an inverse Gaussian right tail.

bob-carpenter commented 7 years ago

Thanks, @aaronjg --- if (1/2) is a subexpression, that will evaluate to 0; we just follow C++ evaluation because we literally translate it to the same expression, 1 / 2 in C++.

bob-carpenter commented 7 years ago

I like the idea of defining the Bayesian posterior for R2, defined by @bgoodri in a response on StackOverflow: https://stackoverflow.com/questions/44759319/overall-predictive-power-e-g-r2-for-bayesian-linear-mixed-models

aaronjg commented 7 years ago

@bob-carpenter Thanks, I just submitted a pull request for the 1/2 issue. I didn't change the other inv_gamma/gamma thing because I'm not sure if the prose or the model formulation is correct (or if I'm just missing something here).

bob-carpenter commented 7 years ago

@aaronjg Thanks. If you want to just leave the comments, I make a pass every release to fix all the ones noted (or explain why they can't be fixed or won't be fixed until later). Our pull requests are pretty heavy with testing and review for small changes these days.

bob-carpenter commented 7 years ago

From @stemangiola on stan-dev/stan#2315 about p. 193 of 2.16 manual:

In mixture models

real log_theta[K] = log(theta); // cache log calculation

should be replaced by

vector[3] log_theta = log(theta); // cache log calculation

Otherwise gives error.

Turns out there's more to clean up. @lukasvermeer pointed out more issues at https://github.com/stan-dev/stan/issues/2315#issuecomment-312090660

 ordered mu[K];

@lukasvermeer suggested

ata {
  int<lower=1> K; // number of mixture components
  int<lower=1> N; // number of data points
  real y[N]; // observations
}
parameters {
  simplex[K] theta; // mixing proportions
  ordered[K] mu; // locations of mixture components
  vector<lower=0>[K] sigma; // scales of mixture components
}
model {
  vector[K] log_theta = log(theta); // cache log calculation
  sigma ~ lognormal(0, 2);
  mu ~ normal(0, 10);
  for (n in 1:N) {
    vector[K] lps = log_theta;
    for (k in 1:K) {
      lps[k] = lps[k] + normal_lpdf(y[n] | mu[k], sigma[k]);
    }
    target += log_sum_exp(lps);
  }
}
billdenney commented 7 years ago

In section 4.1, it would help to add another real literal example indicating that scientific notation with a "+" is valid. Specifically, could an example like "1.23e+3" be added?

lwiklendt commented 7 years ago

In section 24.1 version 2.16, a couple of lines in the softmax_id function have some typos:

alpha[num_elements(alphac)] = 0;
return softmax(alphac);

should be

alphac1[num_elements(alphac1)] = 0;
return softmax(alphac1);
aaronjg commented 7 years ago

Section 38, "Void Functions" - The main text references two functions, but only one is discussed later on. It looks like the section for increment_log_prob was removed, but the overview was not updated. I think 'reject' should also be in this section.

aaronjg commented 7 years ago

There are a few references to 'google groups' that should be update to reflect the move to discourse.

aaronjg commented 7 years ago

Sec. 26.5 Matrices Parameters and Constants - it looks like there is a typo and 'idx[7,' should be 'idxs[7]'

mitzimorris commented 7 years ago

new section on GPs has footnote referencing URL "mc-stan.org/documentation" which is 404.

also, don't understand first example in GP section - explain logic for assignments to row N of covariance matrix?

furthermore, footnote mentions that program implementing the marginal likelihood GP is in example models - but it isn't.

treysp commented 7 years ago

Thanks for all your team's great work on Stan! A couple of things for you:

real<lower = -1, upper = 1> phi;

bob-carpenter commented 7 years ago

Thanks, @treysp, I'll fix those.

bob-carpenter commented 7 years ago

Explain the Ben RStanArm trick of

data {
  int<lower=0, upper=1> include_alpha;
...
parameter {
  vector[include_alpha ? N : 0] alpha;

It'll work with all types other than simplexes (have to verify that for correlation/covariance types).

aaronjg commented 7 years ago

Example code in 'reparameterization' sections should use the new combined declaration and assignement syntax.

bob-carpenter commented 7 years ago

Add @bgoodri's definition of the bivariate normal CDF:

real binormal_cdf(real z1, real z2, real rho) {
    if (z1 != 0 || z2 != 0) {
      real denom = fabs(rho) < 1.0 ? sqrt((1 + rho) * (1 - rho)) : not_a_number();
      real a1 = (z2 / z1 - rho) / denom;
      real a2 = (z1 / z2 - rho) / denom;
      real product = z1 * z2;
      real delta = product < 0 || (product == 0 && (z1 + z2) < 0);
      return 0.5 * (Phi(z1) + Phi(z2) - delta) - owens_t(z1, a1) - owens_t(z2, a2);
    }
    return 0.25 + asin(rho) / (2 * pi());
  }

Ben added:

if rho = 1, then the bivariate CDF is min(Phi(z1), Phi(z2)) and if rho = -1, it is Phi(z1) + Phi(z2) - 1.

bob-carpenter commented 7 years ago
andrasm commented 7 years ago

Thanks for all the great work around stan!

Just bumped into this today: Page 143, Multilevel 2PL Model:

bob-carpenter commented 7 years ago

And as a stretch goal,

data {
  vector[J] x[N];   // predictors for component membership
  ...
parameters {
  matrix[K - 1, J] beta;  // mixture regression coeffs
  ...
model {
  for (n in 1:N) {
    vector[K] lp = softmax(append_col(beta * x[n], 0));
    for (k in 1:K)
      lp[k] += normal_lpdf(eta[n] | mu[k], sigma);
    target += log_sum_exp(lp);
  }
  ...
bob-carpenter commented 7 years ago

A commenter with non link named "Alex" pointed out on Gelman's blog (http://andrewgelman.com/2017/08/21/mixture-models-stan-can-use-log_mix/#comment-554501) that there's an extra right paren in

target += log_mix(lambda, normal_lpdf(...), normal_lpdf(...)));
seantalts commented 7 years ago

I released 2.17.0 without this because it wasn't mentioned as holding up the release, but we can update the manual independently if you like.

bob-carpenter commented 7 years ago

Thanks. I kept thinking the release was imminent and I would be on vacation, then forgot that we hadn't done 2.17 yet.

It shouldn't hold up the release. After 2.17, we should just update the name of the issue to "next manual, 2.18".

I want to start moving the manual over to bookdown format so we can put it on the web to make it searchable. It's just too painful to search the pdf format. But then we'll have some issue of stability of where we put it if we want any Google juice to help direct people to the appropriate bits.

bob-carpenter commented 7 years ago
jenast commented 7 years ago

I think there's a typo on page 218 in Vers 2.16 (the Cormack-Jolly-Seber model). In the table, should the probability for profile 3 read \phi_2 p_3, instead of \phi_2 \phi_3 ? That seems to make sense, and corresponds to the model below as well.

bob-carpenter commented 7 years ago

src/docs/stan-reference/distributions.tex, line 121:

-\int_{-\infty}^y p(y \, | \, \theta) \ \mathrm{d}\theta.
+\int_{-\infty}^y p(y \, | \, \theta) \ \mathrm{d}y.
mcol commented 7 years ago

The manual is not clear as to where conditional statements are allowed: as the current text doesn't mention restrictions, I thought that conditionals could be used in the data section, which is not true.

bob-carpenter commented 7 years ago

@mcol No statements are allowed in the data section. Might you be thinking about the conditional operator (cond ? x : y)? That should be allowed as long as none of the expressions cond, x, or y involve anything other than data variables, which they couldn't in the data block anyway.

mitzimorris commented 7 years ago

confirmed - this compiles:

data {
  int<lower=1> a;
  int<lower=1> b;
  int c[a > b ? a : b];
}
mcol commented 7 years ago

My point is that in reading the part on conditional statements (section 5.5) and most of the manual up to there, I haven't seen a clear definition as to where these can or cannot be used. Maybe this is a consequence of the fact that program blocks are introduced only later (chapter 6), and it would be enough to forward reference table 6.1 from the earlier sections.

mitzimorris commented 7 years ago

excellent point and thanks for the feedback, it's most valuable. agreed that more overview/context would be useful.

bob-carpenter commented 7 years ago

Add clutter example to mixture chapter as an example of "denoising" (it's an example in Bishop's book (section 10-7.1)

data {
  real<lower = 0, upper = 1> theta;  // clutter ratio
  int<lower = 0> N;
  vector[N] y;
}
parameters {
  real mu;
}
model {
  for (n in 1:N)
    target += log_mix(theta,
                      normal_lpdf(y[n] | mu, 1),
                      normal_lpdf(y[n] | 0, 10));
}
theta <- 0.5
N <- 200
mu <- 4.3

y <- rep(0, N);
for (n in 1:N) {
  if (rbinom(1, 1, 0.5)) {
    y[n] <- rnorm(1, mu, 1)
  } else {
    y[n] <- rnorm(1, 0, 10)
  }
}

library(rstan)
fit <- stan("clutter.stan", data = list(theta=theta, N=N, y=y))
bob-carpenter commented 7 years ago
transformed data {
  vector[4] x = [ 1, 2, 3, 4 ]';
  vector[4] u = x;
  for (t in 2:4)
    u[t] = u[t - 1] * 3;

  x[2:4] = x[1:3] * 3;
  print("u = ", u);
  print("x = ", x);
}

which produces

u = [1,3,9,27]
x = [1,3,6,9]
ssp3nc3r commented 7 years ago

The code in 14.1 (Regression with measurement error) on page 202 does not compile, and I think should be,

vector[N] x;
vector[N] y;

which works.

bob-carpenter commented 7 years ago

Include Ben's discussion of the "Lancaster" parameterization of multinomial in terms of Poissons:

http://discourse.mc-stan.org/t/large-poisson-model-with-individual-effects-is-too-slow/2112/2

bgoodri commented 7 years ago

If people don't have Lancaster's book, these reparameterizations are talked about in his papers at http://www.econ.brown.edu/Faculty/Tony_Lancaster/ . Both the "Incidental Parameters Problem since 1948" and the "Orthogonal Parameters and Panel Data".

On Mon, Oct 9, 2017 at 7:07 PM, Bob Carpenter notifications@github.com wrote:

  • Include Ben's discussion of the "Lancaster" parameterization of multinomial in terms of Poissons:

http://discourse.mc-stan.org/t/large-poisson-model-with- individual-effects-is-too-slow/2112/2?u=bob_carpenter

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/stan/issues/2336#issuecomment-335313592, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqv3iusbzFEXpxqDd3VHd2GkhKW00ks5sqqc2gaJpZM4N--Ak .

avehtari commented 7 years ago
bob-carpenter commented 7 years ago
bob-carpenter commented 7 years ago
bob-carpenter commented 7 years ago

From a side comment on stan-dev/stanc3#1403:

jan-glx commented 7 years ago
enbrown commented 7 years ago

The Stan's Future section in the Preface (preface.tex lines 247-250) are duplicates of what is in the previous section Stan 2 and can probably be removed.

As a minor formatting issue, in the Stan Interfaces section of the introduction (introduction.tex lines 69, 80, etc.), some interfaces are specified as \subsection (such as CmdStan, RStan, and PyStan) while others are \subsubsection (such as MatlabStan, Stan.jl, StataStan, and MathematicaStan). I'm not sure if this is a historic thing (the first being the original interfaces and the later being more recent interfaces that wrap CmdStan) or a typo but it's not clear.

From a conceptual standpoint, section 2.1 Character Encoding is somewhat underspecified. I am far from an expert but it was my understanding that it is impossible to infer the encoding from a character stream (see https://www.youtube.com/watch?v=ysh2B6ZgNXk for far many scary details). So it should be valid to say that all Stan programs will be interpreted as being ISO-8859-1 (since 8-bit ASCII isn't a real thing and the file is being read in byte-by-byte) with only 7-bit ASCII characters being valid in the content of the Stan program and comments being ignored (but treated as 8-bit characters when looking for newlines in src/stan/io/read_line.hpp).

bob-carpenter commented 7 years ago

Thanks, @enbrown.

I'll remove the redundancy. I'm about to do a major re-og on the doc and some of the preface issues will go away. I'll try to make the interface description more specific.

Indeed, it's not generally possible to infer character encodings. Under the hood, we just use the standard I/O streams to read char (8-byte) values in C++.

Maybe this'll be a clearer way to say what's going on, because it's a bit non-standard:

That defines everything but the content of comments. So you can use ISO-8859-1(aka Latin-1) or the other ISO-8859 variants or you can use the UTF-8 encoding of unicode. That's because they share the ASCII code points. You still won't be able to use anything other than the ASCII code points (bytes 0 to 127) for identifiers. Comments can thus contain any sequence of bytes you want other than newline in line comments and "*/" in block comments (those will end the comment sequence).

bob-carpenter commented 7 years ago

Originally reported here: http://discourse.mc-stan.org/t/specifying-the-number-of-samples-for-rng/2384/2