Closed chvandorp closed 2 years ago
This model works without errors when e.g. K = {1,5,3,0,4} (note that subject 4 has zero missing observations)
Alas, it does not.
Subject four gets y = x[10:9]
which used to be empty vector but now is shorthand for reverse(x[9:10])
.
(The assignment of this size-2 vector to size-0 vector y
does not error because the declared sizes are not always enforced; a long-standing bug.)
I find it surprising/confusing that x[:n]
is always in ascending order but x[k:n]
is sometimes in descending order.
Indexing backwards is just trying to be too clever.
@nhuurre when did that happen? I clearly missed this. Interestingly, the model (model 2) does run without any errors. So that means we have another bug. The plot thickens.
// index.stan
data {
int N; // number of subjects
int<lower=0> K[N]; // number of missing observations for each subject
}
parameters {
vector[sum(K)] x; // concatenated missing observations
}
model {
for ( n in 1:N ) {
// get the part of x corresponding to subject n
vector[K[n]] y = x[sum(K[:n-1])+1:sum(K[:n])];
print(K[n], " ", sum(K[:n-1])+1, " ", sum(K[:n]), " ", x[sum(K[:n-1])+1:sum(K[:n])], " ", y);
}
x ~ normal(0,1);
}
import cmdstanpy
sm = cmdstanpy.CmdStanModel(stan_file="index.stan")
data = {
"N": 5,
"K" : [1,2,3,0,4]
}
sam = sm.sample(data=data, chains=1, output_dir="stan-cache")
printed values in stdout file (printing K[n], a, b, x[a:b], y):
1 1 1 [1.613] [1.613]
2 2 3 [-1.44802,0.0725087] [-1.44802,0.0725087]
3 4 6 [-1.26702,-0.399973,0.198619] [-1.26702,-0.399973,0.198619]
0 7 6 [-0.0251393,0.198619] [-0.0251393,0.198619]
4 7 10 [-0.0251393,-0.221255,1.30883,1.96729] [-0.0251393,-0.221255,1.30883,1.96729]
Here, the vector y of subject 4 should have 0 elements, but it has 2.
@nhuurre when did that happen?
If there was a discussion of this, I missed it, as I would have vehemently objected to the change in semantics.
I just tested and this program now "works":
parameters {
vector[2] a;
}
model {
a[2:1] ~ normal(0, 1);
}
My intended semantics when defining the slicing was to have
2:1
denote the empty sequence. {110.3, 19.12, -43.2}[1, 3, 6]
should throw an index out of bounds exception for 6
and not return {110.3, 19.12}
With the behavior (1), the above program should fail because the posterior's improper.
I'm in favor of converting both of these back to their intended semantics.
when I did #2965 I looked over the reference manuals slicing descriptions and didn't see anything on how reverse indices worked. At the time I think we thought that returning a zero sized vector for reverse indices was a bug, so if it's a feature I can make a PR to revert this behavior to how it was. Should we backport this change to 2.2(6:8)? I remember us talking about this very thing and at the time we didn't really understand why someone would want X[2:1]
to be empty and what would happen for a matrix X[1:4, 4:1]
. R supports reverse indexing so we figured that would be the behavior users would expect.
I find it surprising/confusing that
x[:n]
is always in ascending order butx[k:n]
is sometimes in descending order. Indexing backwards is just trying to be too clever.
x[:n]
is read as "from the start up to n
so I think be definition it's always ascending, no?
A few general questions
a[2:1] ~ normal(0, 1);
fail because it would be the same as passing an empty vector?target += normal_lpdf( {} |, 0, 1)
is the rule that for any X[M:N]
where M > N
it should return the same type as X
but of size zero? If so I think that would be good to put in the reference manual.
X[1:4, 4:1]
should this be a matrix with 4 rows and 0 columns or a matrix with 0 rows and 0 columns? y[5:0]
and y[5:(-18)]
return an empty vector?:N
, N:
and N:M
or does this apply in some way to using arrays as indices as well? idt so but just want to checkhttps://mc-stan.org/docs/2_28/reference-manual/language-multi-indexing.html
Steve's right that the reference manual doesn't clearly say that ranges are from low to high. Here's the relevant section:
The user's guide is clearer that it's intended to be a lower and upper bound:
But it doesn't come out and tell the user that they shouldn't write y[8:3]
. To answer your questions,
My intention was that low:high
denote the array {low, low+1, ..., high}
, which if high > low
is the empty sequence.
If x
is a matrix then x[1:4, 4:1]
should be a 4 x 0
matrix if rows(x) >= 4
and should throw an error if rows(x) < 4
.
Yes, y[5:0]
should return an empty vector because y[{ }]
returns an empty vector and 5:0
is a synonym for the empty array {}
(but be careful---empty arrays don't actually work like that because we can't infer the type).
Yes, the general rule is that if is
is a size N
array of integers, then
y[is] = { y[is[1]], y[is[2]], ..., y[is[N]] }.
That's why this should fail:
vector[3] y;
vector[2] z = y[2:4]
should fail and not return y[2:3]
, because y[2:4] = [y[2], y[3], y[4]]'
and y[4]
is undefined.
There's no interaction between the rule for slicing and the rule for asserting that the LHS and RHS should be the same dimensions. The indexing rules decide what the expression dimension is and that has to match the LHS to which it's being assigned. But we decided not to fix the current bug that lets the wrong size array be assigned because @bgoodri is exploiting the bug in rstanarm.
Also, I see why I missed this change. The PR https://github.com/stan-dev/stan/pull/2965 doesn't mention that it changes the semantics of slicing! Could I please ask that if you change semantics (even if you think it's been the intention all along), please call it out in the PR. I'd appreciate it if you also ping me when you do that, since I apparently didn't do such a great job at writing out the reference manual!
But we decided not to fix the current bug that lets the wrong size array be assigned because @bgoodri is exploiting the bug in rstanarm.
I think that is the third time that has been said, and at least the third time that I have thought it is not the case. Can someone point me to where I am doing that?
@bgoodri I think @bob-carpenter could be misremebering this discussion: https://discourse.mc-stan.org/t/ragged-array-expressions/5213/4 where it was not you but Kevin van Horn who said
Please don’t “fix” this until you have an alternative way of providing the same functionality.
(i'm rfc
from the Stan forum thread (https://discourse.mc-stan.org/t/new-slicing-rules-break-my-code/24985))
I was at first confused about when a slice should throw an error vs evaluate to a (perhaps empty) array of indices.
When defining the behaviour forx[start:end]
there are two options:
(i) we could first check to see if start
and end
are both valid indices with respect to the shape of x
, throw an error if not, and otherwise expand start:end
to a (possibly empty) array of indices.
(ii) we could first expand start:end
to a (possibly empty) array of indices first, and then -- provided that produces a nonempty range of indices -- check to see if all those are valid with respect to the shape of x
, and throw an error otherwise.
An interesting example is the case where x
has size 1 -- how can we write an empty slice of the form x[start:end]
in this case? Since Stan's slices are inclusive of both start
and end
indices, if the desired behaviour is that start>end
represents an empty slice, then as x
has size 1, at least one value out of start
and end
must be an invalid index, since the set of possible valid indices for x
only contains the index 1. This suggests that option (i) above, checking the validity of indices with respect to the size of x
before expanding them to a range of indices, is not compatible with being able to select empty slices of arbitrary sized arrays.
Similarly for the case where x
has size 0.
@bob-carpenter 's intended semantics seem consistent with option (ii) and would support both of these scenarios.
It might also be worth double-checking that the semantics for [start:end]
slices work in a consistent way to start:end
syntax in for loops https://mc-stan.org/docs/2_28/reference-manual/for-loops.html#for-loops
@fcostin In a for loop, start:end
is empty when start > end
.
@SteveBronder if you want to keep the reverse order functionality, you could propose an additional stride parameter, as in MATLAB or Julia: 3:-1:1
would represent the sequence 3,2,1. Eigen also support a (negative) stride, I believe.
@fcostin: In general, if x
is an array, then x[1:size(x)]
is equivalent to x
. If x
is size 0, then x[1:0] = {}
. If x
is size 1, then x[1:1] = { x[1] }
, and so on.
My intended semantics was that start:end = { start, start + 1, ..., end }
. Then the boundary condition if start > end
is that start:end = { }
.
Summary:
This issue results from a discussion on the Stan forum (https://discourse.mc-stan.org/t/new-slicing-rules-break-my-code/24985). Here I showed an example model that worked with Stan versions < 2.26, but resulted in thrown exceptions in newer versions.
Description:
The issue is that the bounds
a
andb
in an expression likex[a:b]
now have to be valid indices (i.e.0 < a, b <= n
wheren = num_elements(x)
. The previous behavior was that an expression likex[1:0]
resulted in an empty vector. Below I'll give two examples in which this can be very useful. However, with the newer versions, this leads to a runtime error. These range checks are currently done for vectors, but not for arrays.Reproducible Steps:
Model 1: inserting a constant into a parameter vector. This example comes from a model in which I modeled odds ratios
x
for different classes 1,2,...,n. The "user" of the model can choose a reference classi
, and for this class the odds ratio will be 1. I then used thetransformed parameters
block to define a vector y = [x[1], x[2], ..., x[i-1], 1, x[i], ..., x[n-1]]. (note that the vectorx
has lengthn-1
). I implemented this as follows:Whenever
i > 1
andi < n
there is no problem with this, but when the user choosesi=1
, then the first slicex[:i-1]
is equal tox[:0]
, which used to represent an empty vector. Currently, we get an exception, because 0 is not a valid index. We get a similar situation withi=n
. (BTW: I'm aware that my model can be fixed usinghead
andtail
functions :-).Model 2: ragged data/parameter structure.
Suppose that we have
N
subjects and for each subject we have a number of observations, some of which may be missing. The number of missing observationsK[n]
differs between subjects and crucially can be zero for some subjects. We want to model these missing observations explicitly using a vectorvector[sum(K)] x
. The missing observations specific for subjectn
can be selected fromx
usingvector[K[n]] y = x[sum(K[:n-1])+1:sum(K[:n])]
. Here's a "working" Stan model:This model works without errors when e.g.
K = {1,5,3,0,4}
(note that subject 4 has zero missing observations), but we get exceptions when e.g.K = {0,1,5,3,4}
, because in that case we selectvector[0] y = x[1:0]
for the first subject. Something similar happens for whenK[N] = 0
.Current Output:
The error is a thrown exception like this one:
Expected Output:
The models work without warnings or exceptions with Stan versions < 2.26
Additional Information:
x
andy
byarray
s, all models work without errors in the current version. So this is somewhat inconsistent.For now, I would argue that the change has to be reverted, because it could break a lot of people's models (I can't be the only one making use of this). Then we can have a discussion about returning empty vectors or throwing exceptions.
Current Version:
v2.28.1