new vector slicing bound checks break code

chvandorp commented 3 years ago

Summary:

This issue results from a discussion on the Stan forum (https://discourse.mc-stan.org/t/new-slicing-rules-break-my-code/24985). Here I showed an example model that worked with Stan versions < 2.26, but resulted in thrown exceptions in newer versions.

Description:

The issue is that the bounds a and b in an expression like x[a:b] now have to be valid indices (i.e. 0 < a, b <= n where n = num_elements(x). The previous behavior was that an expression like x[1:0] resulted in an empty vector. Below I'll give two examples in which this can be very useful. However, with the newer versions, this leads to a runtime error. These range checks are currently done for vectors, but not for arrays.

Reproducible Steps:

Model 1: inserting a constant into a parameter vector. This example comes from a model in which I modeled odds ratios x for different classes 1,2,...,n. The "user" of the model can choose a reference class i, and for this class the odds ratio will be 1. I then used the transformed parameters block to define a vector y = [x[1], x[2], ..., x[i-1], 1, x[i], ..., x[n-1]]. (note that the vector x has length n-1). I implemented this as follows:

data {
    int<lower=1> n; // length of vector y
    int<lower=1, upper=n> i; // index of reference class
}
parameters { 
    vector[n-1] x; // odds ratios of other classes
}
transformed parameters {
    vector[n] y = append_row(x[:i-1], append_row(rep_vector(1.0, 1), x[i:]));
}
model {
    y ~ normal(0, 1); // do something with y
}

Whenever i > 1 and i < n there is no problem with this, but when the user chooses i=1, then the first slice x[:i-1] is equal to x[:0], which used to represent an empty vector. Currently, we get an exception, because 0 is not a valid index. We get a similar situation with i=n. (BTW: I'm aware that my model can be fixed using head and tail functions :-).

Model 2: ragged data/parameter structure.
Suppose that we have N subjects and for each subject we have a number of observations, some of which may be missing. The number of missing observations K[n] differs between subjects and crucially can be zero for some subjects. We want to model these missing observations explicitly using a vector vector[sum(K)] x. The missing observations specific for subject n can be selected from x using vector[K[n]] y = x[sum(K[:n-1])+1:sum(K[:n])]. Here's a "working" Stan model:

data {
    int N; // number of subjects
    int<lower=0> K[N]; // number of missing observations for each subject
}
parameters {
    vector[sum(K)] x; // concatenated missing observations
}
model {
    for ( n in 1:N ) {
        // get the part of x corresponding to subject n        
        vector[K[n]] y = x[sum(K[:n-1])+1:sum(K[:n])];

        // do something with y...
    }
    x ~ normal(0,1);
}

This model works without errors when e.g. K = {1,5,3,0,4} (note that subject 4 has zero missing observations), but we get exceptions when e.g. K = {0,1,5,3,4}, because in that case we select vector[0] y = x[1:0] for the first subject. Something similar happens for when K[N] = 0.

Current Output:

The error is a thrown exception like this one:

vector[min] indexing: accessing element out of range. index 4 out of range; expecting index to be between 1 and 3

Expected Output:

The models work without warnings or exceptions with Stan versions < 2.26

Additional Information:

There was some discussion on the forum whether or not the old behavior was actually unintended and a "bug" instead of a feature. Intended or not, I think it is pretty useful and also common practice in other languages (e.g. Python, Julia, Go).
Others argued that allowing out-of-range indices could lead to uncaught programming errors.
The change was likely made here: https://github.com/stan-dev/stan/pull/2965
For arrays, this kind of slicing is still allowed: if we replace all x and y by arrays, all models work without errors in the current version. So this is somewhat inconsistent.

For now, I would argue that the change has to be reverted, because it could break a lot of people's models (I can't be the only one making use of this). Then we can have a discussion about returning empty vectors or throwing exceptions.

Current Version:

v2.28.1

nhuurre commented 3 years ago

This model works without errors when e.g. K = {1,5,3,0,4} (note that subject 4 has zero missing observations)

Alas, it does not. Subject four gets y = x[10:9] which used to be empty vector but now is shorthand for reverse(x[9:10]). (The assignment of this size-2 vector to size-0 vector y does not error because the declared sizes are not always enforced; a long-standing bug.)

I find it surprising/confusing that x[:n] is always in ascending order but x[k:n] is sometimes in descending order. Indexing backwards is just trying to be too clever.

chvandorp commented 3 years ago

@nhuurre when did that happen? I clearly missed this. Interestingly, the model (model 2) does run without any errors. So that means we have another bug. The plot thickens.

// index.stan
data {
    int N; // number of subjects
    int<lower=0> K[N]; // number of missing observations for each subject
}
parameters {
    vector[sum(K)] x; // concatenated missing observations
}
model {
    for ( n in 1:N ) {
        // get the part of x corresponding to subject n     
        vector[K[n]] y = x[sum(K[:n-1])+1:sum(K[:n])];
        print(K[n], " ", sum(K[:n-1])+1, " ", sum(K[:n]), " ", x[sum(K[:n-1])+1:sum(K[:n])], " ", y);
    }
    x ~ normal(0,1);
}

import cmdstanpy

sm = cmdstanpy.CmdStanModel(stan_file="index.stan")
data = {
    "N": 5,
    "K" : [1,2,3,0,4]
}
sam = sm.sample(data=data, chains=1, output_dir="stan-cache")

printed values in stdout file (printing K[n], a, b, x[a:b], y):

1 1 1 [1.613] [1.613]
2 2 3 [-1.44802,0.0725087] [-1.44802,0.0725087]
3 4 6 [-1.26702,-0.399973,0.198619] [-1.26702,-0.399973,0.198619]
0 7 6 [-0.0251393,0.198619] [-0.0251393,0.198619]
4 7 10 [-0.0251393,-0.221255,1.30883,1.96729] [-0.0251393,-0.221255,1.30883,1.96729]

Here, the vector y of subject 4 should have 0 elements, but it has 2.

bob-carpenter commented 3 years ago

@nhuurre when did that happen?

If there was a discussion of this, I missed it, as I would have vehemently objected to the change in semantics.

I just tested and this program now "works":

parameters {
  vector[2] a;
}
model {
  a[2:1] ~ normal(0, 1);
}

My intended semantics when defining the slicing was to have

2:1 denote the empty sequence.
{110.3, 19.12, -43.2}[1, 3, 6] should throw an index out of bounds exception for 6 and not return {110.3, 19.12}

With the behavior (1), the above program should fail because the posterior's improper.

I'm in favor of converting both of these back to their intended semantics.

SteveBronder commented 3 years ago

when I did #2965 I looked over the reference manuals slicing descriptions and didn't see anything on how reverse indices worked. At the time I think we thought that returning a zero sized vector for reverse indices was a bug, so if it's a feature I can make a PR to revert this behavior to how it was. Should we backport this change to 2.2(6:8)? I remember us talking about this very thing and at the time we didn't really understand why someone would want X[2:1] to be empty and what would happen for a matrix X[1:4, 4:1]. R supports reverse indexing so we figured that would be the behavior users would expect.

I find it surprising/confusing that x[:n] is always in ascending order but x[k:n] is sometimes in descending order. Indexing backwards is just trying to be too clever.

x[:n] is read as "from the start up to n so I think be definition it's always ascending, no?

A few general questions

Should the a[2:1] ~ normal(0, 1); fail because it would be the same as passing an empty vector?

target += normal_lpdf( {} |, 0, 1)

is the rule that for any X[M:N] where M > N it should return the same type as X but of size zero? If so I think that would be good to put in the reference manual.

For a matrix X[1:4, 4:1] should this be a matrix with 4 rows and 0 columns or a matrix with 0 rows and 0 columns?
Should this override bad indexing values? i.e. Should a vector y[5:0] and y[5:(-18)] return an empty vector?
Is this just for slicing like :N, N: and N:M or does this apply in some way to using arrays as indices as well? idt so but just want to check
Does supporting this stop us from doing the fix asserting that the lhs and rhs should be equal in size? As the above code that uses reverse order giving back zero indices also makes that assumption it feels like this is used in the same situations and if we support one we should support the other.

https://mc-stan.org/docs/2_28/reference-manual/language-multi-indexing.html

bob-carpenter commented 3 years ago

Steve's right that the reference manual doesn't clearly say that ranges are from low to high. Here's the relevant section:

https://mc-stan.org/docs/2_28/reference-manual/language-multi-indexing.html

The user's guide is clearer that it's intended to be a lower and upper bound:

https://mc-stan.org/docs/2_28/stan-users-guide/slicing-with-range-indexes.html

But it doesn't come out and tell the user that they shouldn't write y[8:3]. To answer your questions,

My intention was that low:high denote the array {low, low+1, ..., high}, which if high > low is the empty sequence.
If x is a matrix then x[1:4, 4:1] should be a 4 x 0 matrix if rows(x) >= 4 and should throw an error if rows(x) < 4.
Yes, y[5:0] should return an empty vector because y[{ }] returns an empty vector and 5:0 is a synonym for the empty array {} (but be careful---empty arrays don't actually work like that because we can't infer the type).
Yes, the general rule is that if is is a size N array of integers, then
```
y[is] = { y[is[1]], y[is[2]], ..., y[is[N]] }.
```
That's why this should fail:
```
vector[3] y;
vector[2] z = y[2:4]
```
should fail and not return y[2:3], because y[2:4] = [y[2], y[3], y[4]]' and y[4] is undefined.
There's no interaction between the rule for slicing and the rule for asserting that the LHS and RHS should be the same dimensions. The indexing rules decide what the expression dimension is and that has to match the LHS to which it's being assigned. But we decided not to fix the current bug that lets the wrong size array be assigned because @bgoodri is exploiting the bug in rstanarm.

bob-carpenter commented 3 years ago

Also, I see why I missed this change. The PR https://github.com/stan-dev/stan/pull/2965 doesn't mention that it changes the semantics of slicing! Could I please ask that if you change semantics (even if you think it's been the intention all along), please call it out in the PR. I'd appreciate it if you also ping me when you do that, since I apparently didn't do such a great job at writing out the reference manual!

bgoodri commented 3 years ago

But we decided not to fix the current bug that lets the wrong size array be assigned because @bgoodri is exploiting the bug in rstanarm.

I think that is the third time that has been said, and at least the third time that I have thought it is not the case. Can someone point me to where I am doing that?

nhuurre commented 3 years ago

@bgoodri I think @bob-carpenter could be misremebering this discussion: https://discourse.mc-stan.org/t/ragged-array-expressions/5213/4 where it was not you but Kevin van Horn who said

Please don’t “fix” this until you have an alternative way of providing the same functionality.

fcostin commented 3 years ago

(i'm rfc from the Stan forum thread (https://discourse.mc-stan.org/t/new-slicing-rules-break-my-code/24985))

I was at first confused about when a slice should throw an error vs evaluate to a (perhaps empty) array of indices.

When defining the behaviour forx[start:end] there are two options: (i) we could first check to see if start and end are both valid indices with respect to the shape of x, throw an error if not, and otherwise expand start:end to a (possibly empty) array of indices. (ii) we could first expand start:end to a (possibly empty) array of indices first, and then -- provided that produces a nonempty range of indices -- check to see if all those are valid with respect to the shape of x, and throw an error otherwise.

An interesting example is the case where x has size 1 -- how can we write an empty slice of the form x[start:end] in this case? Since Stan's slices are inclusive of both start and end indices, if the desired behaviour is that start>end represents an empty slice, then as x has size 1, at least one value out of start and end must be an invalid index, since the set of possible valid indices for x only contains the index 1. This suggests that option (i) above, checking the validity of indices with respect to the size of x before expanding them to a range of indices, is not compatible with being able to select empty slices of arbitrary sized arrays.

Similarly for the case where x has size 0.

@bob-carpenter 's intended semantics seem consistent with option (ii) and would support both of these scenarios.

It might also be worth double-checking that the semantics for [start:end] slices work in a consistent way to start:end syntax in for loops https://mc-stan.org/docs/2_28/reference-manual/for-loops.html#for-loops

chvandorp commented 3 years ago

@fcostin In a for loop, start:end is empty when start > end.

@SteveBronder if you want to keep the reverse order functionality, you could propose an additional stride parameter, as in MATLAB or Julia: 3:-1:1 would represent the sequence 3,2,1. Eigen also support a (negative) stride, I believe.

bob-carpenter commented 3 years ago

@fcostin: In general, if x is an array, then x[1:size(x)] is equivalent to x. If x is size 0, then x[1:0] = {}. If x is size 1, then x[1:1] = { x[1] }, and so on.

My intended semantics was that start:end = { start, start + 1, ..., end }. Then the boundary condition if start > end is that start:end = { }.

stan-dev / stan