Indexing Consistency Check

stan-dev / stanc3

The Stan transpiler (from Stan to C++ and beyond).

BSD 3-Clause "New" or "Revised" License

140 stars 44 forks source link

Indexing Consistency Check #1223

Closed stuvet closed 1 year ago

stuvet commented 2 years ago

I thought I'd raise this feature request as it's likely to be a common mistake that could be entirely silent and is hard for users to detect. On the other hand it might be simple for the compiler to check?

e.g. something like

data {
    int<lower=0> N;
    int<lower=0> K; // Implicit: K<=N -> silent error
}

transformed parameters {
    matrix[K,N] Z = rep_matrix(0, K, N);

    for ( k in 1:K ) {
       Z[,k] = ... // Index the wrong dimension.
    }
}

I wonder if the compiler can trace back the indexing of Z[,k] through the for ( k in 1:K) {, recognise that the maximum value of k is the same as the rows of Z, and flag up a compiler warning to check the indexing in that line? It'd be particularly helpful if the same check could trace through matrix inversion, conversion to vector arrays etc. I'm not sure if that's realistic, but if I don't ask...

Thanks for considering.

WardBrian commented 2 years ago

I think doing this fully in general is not possible[^1], but it's definitely something we could produce warnings heuristically for during something like pedantic mode.

Errors like this should be trapped at runtime unless you have STAN_NO_RANGE_CHECKS set while compiling

[^1]: To see why, consider a model with an extra input int<lower=0,upper=1> flag, and then have Z be defined matrix[K, flag ? N : K]. Now, the validity of the indexing depends on flag.

stuvet commented 2 years ago

Thanks for the feedback, & I understand that generalising could be a problem.

Can I just check that STAN_NO_RANGE_CHECKS should be the only reason this error would be missed in the case that K<N? I discovered a similar scenario in my code before posting, & I haven't set this flag. As far I I can tell it was running perfectly fine on cmdstanr 2.29.2.

I've just been experimenting with a pretty contrived example & get no errors at runtime even without STAN_NO_RANGE_CHECKS

WardBrian commented 2 years ago

Recent versions of Stan should throw errors if you index outside the bounds of an object or assign the wrong sized object to a name. Can you share an example?

bob-carpenter commented 2 years ago

Thanks for the feature request, @stuvet.

If K <= N, the following program fragment is OK---it just assigns the first K rows of A:

matrix[K, N] A;
...
for (k in 1:K)
  A[k] = rep_row_vector(1, N);

If the RHS is instead rep_row_vector(1, K) then that's a problem (if K != N) and will cause a runtime error and warning (or perhaps segfault if range checks are turned off).

We can flag that heuristically as a possible error in some cases. In other cases the above may be what I intended, and I might continue with something like (for n in K + 1:N) A[n] = ... to fill in the rest.

Even trying to figure out which statements get executed is a Turing-complete language like Stan is an undecidable problem, so there's no way we can statically catch all errors.

WardBrian commented 1 year ago

The previous issue https://github.com/stan-dev/stanc3/issues/225 covers this as well, but it was a bit more general. I'm going to close this as a duplicate of that issue