next manual, 2.17 - Githubissues

bob-carpenter commented 7 years ago

Summary:

This is the issue for suggesting fixes for the Stan manual. Please just add suggestions as comments rather than opening new issues.

Current Version:

v2.16.0

bob-carpenter commented 7 years ago

HMM Viterbi example problem in examples.tex.

Originally reported in https://github.com/stan-dev/example-models/issues/31#issuecomment-309627744

[x] fix
[x] thank https://github.com/frobnitzem

luisdamiano commented 7 years ago

programming.tex line 185 of the reference manual v2.16.0 I believe it only makes sense if "kappa theta" becomes "kappa phi" instead, especially since alpha is a k-sized vector.

[x] fix
[x] thank Luis Damiano

aaronjg commented 7 years ago

in the GP section "In this case, an inverse gamma, inv_gamma_lpdf in Stan’s language, will work well as it has a sharp left tail that puts negligible mass on length-scales, but a generous right tail, allowing for large length-scales."

needs an adjective to describe length-scales in the first clause. Perhaps: "In this case, an inverse gamma, inv_gamma_lpdf in Stan’s language, will work well as it has a sharp left tail that puts negligible mass on infinitesimal length-scales, but a generous right tail, allowing for large length-scales."

[x] fix
[x] thank Aaron Goodman

bob-carpenter commented 7 years ago

[x] done with above

OK, I'll fix that. I'll mention that gamma, inverse gamma and lognormal are all "zero avoiding" in the sense that the limit of the density at zero is zero and estimates near zero will be pushed away. Then I'll cross-reference the discussion in the regression chapter, which cites Andrew's and Vince's papers.

bob-carpenter commented 7 years ago

[x] add note to int_step() saying that it differs in behavior at 0 from step()
[x] add similar note to step()

aaronjg commented 7 years ago

In the GP section on page 247 and 255, the example code multiplies by 1/2. It seems like this would just round to 1 given integer division. Everywhere else in the manual uses multiplication by 0.5 instead.

[x] fix

aaronjg commented 7 years ago

Also in the GP section, I think that rho ~ gamma(4,4) should be rho ~ inv_gamma(4,4). The text refers to the benefits of the inverse gamma distribution, so the example code should use that as well.

I think there is a similar issue with the description of the generalized inverse gaussian. The manual says that the GIG has a Gaussian right tail, but actually it has an inverse Gaussian right tail.

[x] fix

bob-carpenter commented 7 years ago

Thanks, @aaronjg --- if (1/2) is a subexpression, that will evaluate to 0; we just follow C++ evaluation because we literally translate it to the same expression, 1 / 2 in C++.

bob-carpenter commented 7 years ago

I like the idea of defining the Bayesian posterior for R², defined by @bgoodri in a response on StackOverflow: https://stackoverflow.com/questions/44759319/overall-predictive-power-e-g-r2-for-bayesian-linear-mixed-models

[x] moved to long-term issues

aaronjg commented 7 years ago

@bob-carpenter Thanks, I just submitted a pull request for the 1/2 issue. I didn't change the other inv_gamma/gamma thing because I'm not sure if the prose or the model formulation is correct (or if I'm just missing something here).

bob-carpenter commented 7 years ago

@aaronjg Thanks. If you want to just leave the comments, I make a pass every release to fix all the ones noted (or explain why they can't be fixed or won't be fixed until later). Our pull requests are pretty heavy with testing and review for small changes these days.

bob-carpenter commented 7 years ago

From @stemangiola on stan-dev/stan#2315 about p. 193 of 2.16 manual:

In mixture models

real log_theta[K] = log(theta); // cache log calculation

should be replaced by

vector[3] log_theta = log(theta); // cache log calculation

Otherwise gives error.

[x] fix

Turns out there's more to clean up. @lukasvermeer pointed out more issues at https://github.com/stan-dev/stan/issues/2315#issuecomment-312090660

 ordered mu[K];

[x] K in wrong place

@lukasvermeer suggested

ata {
  int<lower=1> K; // number of mixture components
  int<lower=1> N; // number of data points
  real y[N]; // observations
}
parameters {
  simplex[K] theta; // mixing proportions
  ordered[K] mu; // locations of mixture components
  vector<lower=0>[K] sigma; // scales of mixture components
}
model {
  vector[K] log_theta = log(theta); // cache log calculation
  sigma ~ lognormal(0, 2);
  mu ~ normal(0, 10);
  for (n in 1:N) {
    vector[K] lps = log_theta;
    for (k in 1:K) {
      lps[k] = lps[k] + normal_lpdf(y[n] | mu[k], sigma[k]);
    }
    target += log_sum_exp(lps);
  }
}

[x] thank Lukas Vermeer

billdenney commented 7 years ago

In section 4.1, it would help to add another real literal example indicating that scientific notation with a "+" is valid. Specifically, could an example like "1.23e+3" be added?

[x] add 1.23e+3 example literal
[x] thank Bill Denney

lwiklendt commented 7 years ago

In section 24.1 version 2.16, a couple of lines in the softmax_id function have some typos:

alpha[num_elements(alphac)] = 0;
return softmax(alphac);

should be

alphac1[num_elements(alphac1)] = 0;
return softmax(alphac1);

[x] fix
[x] thank Lukasz Wiklendt

aaronjg commented 7 years ago

Section 38, "Void Functions" - The main text references two functions, but only one is discussed later on. It looks like the section for increment_log_prob was removed, but the overview was not updated. I think 'reject' should also be in this section.

[x] fix reference
[x] add reject statements

aaronjg commented 7 years ago

There are a few references to 'google groups' that should be update to reflect the move to discourse.

[x] fix

aaronjg commented 7 years ago

Sec. 26.5 Matrices Parameters and Constants - it looks like there is a typo and 'idx[7,' should be 'idxs[7]'

[x] fixed to idxs[7, 2]

mitzimorris commented 7 years ago

new section on GPs has footnote referencing URL "mc-stan.org/documentation" which is 404.

[x] fixed that and grepped to fix a few dozen more

also, don't understand first example in GP section - explain logic for assignments to row N of covariance matrix?

[x] this may be reference to old code as there are no assignments to rows anywhere in the current first example

furthermore, footnote mentions that program implementing the marginal likelihood GP is in example models - but it isn't.

[x] no longer a footnote for this; there is discussion of marginal likelihood, but not sure it's wrong

treysp commented 7 years ago

Thanks for all your team's great work on Stan! A couple of things for you:

[x] Section 3.4, "Positive, Ordered Vectors" section, PDF page 41, sentence missing words. I think it should be (missing words asterisked):

Like ordered vectors, after their declaration positive ordered vectors *may be* assigned 
to other vectors and other vectors may be assigned to them.

[x] Section 10.4, last line on top of PDF page 170 has typo ("read" instead of "real"). Should be:

real<lower = -1, upper = 1> phi;

[x] thank Trey Spiller

bob-carpenter commented 7 years ago

Thanks, @treysp, I'll fix those.

[x] thank Trey Spiller in the acknowledgements

bob-carpenter commented 7 years ago

Explain the Ben RStanArm trick of

data {
  int<lower=0, upper=1> include_alpha;
...
parameter {
  vector[include_alpha ? N : 0] alpha;

It'll work with all types other than simplexes (have to verify that for correlation/covariance types).

[x] include

aaronjg commented 7 years ago

Example code in 'reparameterization' sections should use the new combined declaration and assignement syntax.

[x] fix

bob-carpenter commented 7 years ago

Add @bgoodri's definition of the bivariate normal CDF:

real binormal_cdf(real z1, real z2, real rho) {
    if (z1 != 0 || z2 != 0) {
      real denom = fabs(rho) < 1.0 ? sqrt((1 + rho) * (1 - rho)) : not_a_number();
      real a1 = (z2 / z1 - rho) / denom;
      real a2 = (z1 / z2 - rho) / denom;
      real product = z1 * z2;
      real delta = product < 0 || (product == 0 && (z1 + z2) < 0);
      return 0.5 * (Phi(z1) + Phi(z2) - delta) - owens_t(z1, a1) - owens_t(z2, a2);
    }
    return 0.25 + asin(rho) / (2 * pi());
  }

Ben added:

if rho = 1, then the bivariate CDF is min(Phi(z1), Phi(z2)) and if rho = -1, it is Phi(z1) + Phi(z2) - 1.

[x] add

bob-carpenter commented 7 years ago

[x] add discussion in efficiency chapter about the cost of validating constraints for structured matrices; an alternative to cov_matrix in a transformed data (not so costly) or transformed parameters block is to just use matrix and skip the cubic algorithm to validate

andrasm commented 7 years ago

Thanks for all the great work around stan!

Just bumped into this today: Page 143, Multilevel 2PL Model:

[x] sigma_alpha is either a leftover in the model block or missing from the parameters block.
[x] the comment after mu_beta says "mean student ability" but isn't that meant to be "mean question difficulty" instead?
[x] thank Andras

bob-carpenter commented 7 years ago

[x] fix spacing in multi-logit regression section
[x] vectorize

And as a stretch goal,

[x] for the K - 1 parameterizations, use append_row(..., 0) to construct the K-vector of linear predictors

data {
  vector[J] x[N];   // predictors for component membership
  ...
parameters {
  matrix[K - 1, J] beta;  // mixture regression coeffs
  ...
model {
  for (n in 1:N) {
    vector[K] lp = softmax(append_col(beta * x[n], 0));
    for (k in 1:K)
      lp[k] += normal_lpdf(eta[n] | mu[k], sigma);
    target += log_sum_exp(lp);
  }
  ...

bob-carpenter commented 7 years ago

A commenter with non link named "Alex" pointed out on Gelman's blog (http://andrewgelman.com/2017/08/21/mixture-models-stan-can-use-log_mix/#comment-554501) that there's an extra right paren in

target += log_mix(lambda, normal_lpdf(...), normal_lpdf(...)));

[x] remove extra right paren in example
[x] assume the thumbs up came from the ame Alex and thank Alex Perrone

seantalts commented 7 years ago

I released 2.17.0 without this because it wasn't mentioned as holding up the release, but we can update the manual independently if you like.

bob-carpenter commented 7 years ago

Thanks. I kept thinking the release was imminent and I would be on vacation, then forgot that we hadn't done 2.17 yet.

It shouldn't hold up the release. After 2.17, we should just update the name of the issue to "next manual, 2.18".

I want to start moving the manual over to bookdown format so we can put it on the web to make it searchable. It's just too painful to search the pdf format. But then we'll have some issue of stability of where we put it if we want any Google juice to help direct people to the appropriate bits.

bob-carpenter commented 7 years ago

[x] Fix append_array function doc to indicate that the max order is 7, not 8.

jenast commented 7 years ago

I think there's a typo on page 218 in Vers 2.16 (the Cormack-Jolly-Seber model). In the table, should the probability for profile 3 read \phi_2 p_3, instead of \phi_2 \phi_3 ? That seems to make sense, and corresponds to the model below as well.

[x] fix
[x] thank Jens Astrom

bob-carpenter commented 7 years ago

[x] make fix directly for pull request stan-dev/stan#2400

src/docs/stan-reference/distributions.tex, line 121:

-\int_{-\infty}^y p(y \, | \, \theta) \ \mathrm{d}\theta.
+\int_{-\infty}^y p(y \, | \, \theta) \ \mathrm{d}y.

[x] thank Massimo Santini

mcol commented 7 years ago

The manual is not clear as to where conditional statements are allowed: as the current text doesn't mention restrictions, I thought that conditionals could be used in the data section, which is not true.

bob-carpenter commented 7 years ago

@mcol No statements are allowed in the data section. Might you be thinking about the conditional operator (cond ? x : y)? That should be allowed as long as none of the expressions cond, x, or y involve anything other than data variables, which they couldn't in the data block anyway.

mitzimorris commented 7 years ago

confirmed - this compiles:

data {
  int<lower=1> a;
  int<lower=1> b;
  int c[a > b ? a : b];
}

mcol commented 7 years ago

My point is that in reading the part on conditional statements (section 5.5) and most of the manual up to there, I haven't seen a clear definition as to where these can or cannot be used. Maybe this is a consequence of the fact that program blocks are introduced only later (chapter 6), and it would be enough to forward reference table 6.1 from the earlier sections.

[x] add a forward reference at the start of the statements chapter
[x] thank Marco Colombo

mitzimorris commented 7 years ago

excellent point and thanks for the feedback, it's most valuable. agreed that more overview/context would be useful.

bob-carpenter commented 7 years ago

[x] punt until future

Add clutter example to mixture chapter as an example of "denoising" (it's an example in Bishop's book (section 10-7.1)

data {
  real<lower = 0, upper = 1> theta;  // clutter ratio
  int<lower = 0> N;
  vector[N] y;
}
parameters {
  real mu;
}
model {
  for (n in 1:N)
    target += log_mix(theta,
                      normal_lpdf(y[n] | mu, 1),
                      normal_lpdf(y[n] | 0, 10));
}

theta <- 0.5
N <- 200
mu <- 4.3

y <- rep(0, N);
for (n in 1:N) {
  if (rbinom(1, 1, 0.5)) {
    y[n] <- rnorm(1, mu, 1)
  } else {
    y[n] <- rnorm(1, 0, 10)
  }
}

library(rstan)
fit <- stan("clutter.stan", data = list(theta=theta, N=N, y=y))

bob-carpenter commented 7 years ago

[x] Add example of how Stan behaves w.r.t. aliasing somewhere near the discussion of vector arithmetic

transformed data {
  vector[4] x = [ 1, 2, 3, 4 ]';
  vector[4] u = x;
  for (t in 2:4)
    u[t] = u[t - 1] * 3;

  x[2:4] = x[1:3] * 3;
  print("u = ", u);
  print("x = ", x);
}

which produces

u = [1,3,9,27]
x = [1,3,6,9]

ssp3nc3r commented 7 years ago

The code in 14.1 (Regression with measurement error) on page 202 does not compile, and I think should be,

vector[N] x;
vector[N] y;

which works.

[x] fix
[x] thank ssp3nc3r

bob-carpenter commented 7 years ago

[x] punting for longer term when I understand what's being reparameterized

Include Ben's discussion of the "Lancaster" parameterization of multinomial in terms of Poissons:

http://discourse.mc-stan.org/t/large-poisson-model-with-individual-effects-is-too-slow/2112/2

bgoodri commented 7 years ago

If people don't have Lancaster's book, these reparameterizations are talked about in his papers at http://www.econ.brown.edu/Faculty/Tony_Lancaster/ . Both the "Incidental Parameters Problem since 1948" and the "Orthogonal Parameters and Panel Data".

On Mon, Oct 9, 2017 at 7:07 PM, Bob Carpenter notifications@github.com wrote:

Include Ben's discussion of the "Lancaster" parameterization of multinomial in terms of Poissons:

http://discourse.mc-stan.org/t/large-poisson-model-with- individual-effects-is-too-slow/2112/2?u=bob_carpenter

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/stan/issues/2336#issuecomment-335313592, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqv3iusbzFEXpxqDd3VHd2GkhKW00ks5sqqc2gaJpZM4N--Ak .

avehtari commented 7 years ago

[x] Fix real x[] -> real[] x in Section 41.1. and Index or should these be reals?
[x] also fix elsewhee

bob-carpenter commented 7 years ago

[x] thank Jan Gleixner for stan-dev/stan#2423
[x] add space to Jan's fix to the BNF for print/reject

bob-carpenter commented 7 years ago

[x] add code for change-point examples (in my local file mining-disasters)

bob-carpenter commented 7 years ago

From a side comment on stan-dev/stanc3#1403:

[x] replace numeric_literal with real_literal in BNF
[x] remove numeric_literal definition

jan-glx commented 7 years ago

[x] In this line of Sec 24.1 "Examples of Collinearity: Redundant Intercepts". it should be $\lambda_1 + q, \lambda_2 - q$ not $\lambda_1 + q, \lambda_1 - q$ .

enbrown commented 7 years ago

The Stan's Future section in the Preface (preface.tex lines 247-250) are duplicates of what is in the previous section Stan 2 and can probably be removed.

As a minor formatting issue, in the Stan Interfaces section of the introduction (introduction.tex lines 69, 80, etc.), some interfaces are specified as \subsection (such as CmdStan, RStan, and PyStan) while others are \subsubsection (such as MatlabStan, Stan.jl, StataStan, and MathematicaStan). I'm not sure if this is a historic thing (the first being the original interfaces and the later being more recent interfaces that wrap CmdStan) or a typo but it's not clear.

From a conceptual standpoint, section 2.1 Character Encoding is somewhat underspecified. I am far from an expert but it was my understanding that it is impossible to infer the encoding from a character stream (see https://www.youtube.com/watch?v=ysh2B6ZgNXk for far many scary details). So it should be valid to say that all Stan programs will be interpreted as being ISO-8859-1 (since 8-bit ASCII isn't a real thing and the file is being read in byte-by-byte) with only 7-bit ASCII characters being valid in the content of the Stan program and comments being ignored (but treated as 8-bit characters when looking for newlines in src/stan/io/read_line.hpp).

[x] fix
[x] thank Eric N. Brown

bob-carpenter commented 7 years ago

Thanks, @enbrown.

I'll remove the redundancy. I'm about to do a major re-og on the doc and some of the preface issues will go away. I'll try to make the interface description more specific.

[x] Add following description

Indeed, it's not generally possible to infer character encodings. Under the hood, we just use the standard I/O streams to read char (8-byte) values in C++.

Maybe this'll be a clearer way to say what's going on, because it's a bit non-standard:

punctuation and whitespace are defined by the by ASCII code points.
identifiers must be ASCII

That defines everything but the content of comments. So you can use ISO-8859-1(aka Latin-1) or the other ISO-8859 variants or you can use the UTF-8 encoding of unicode. That's because they share the ASCII code points. You still won't be able to use anything other than the ASCII code points (bytes 0 to 127) for identifiers. Comments can thus contain any sequence of bytes you want other than newline in line comments and "*/" in block comments (those will end the comment sequence).

bob-carpenter commented 7 years ago

[x] change "sample" to "draw" in description of generated quantities
[x] thank Jonathan Sweeney for reporting

Originally reported here: http://discourse.mc-stan.org/t/specifying-the-number-of-samples-for-rng/2384/2

stan-dev / stan

next manual, 2.17 #2336

Summary:

Current Version: