stan-dev / stanc3

The Stan transpiler (from Stan to C++ and beyond).
BSD 3-Clause "New" or "Revised" License
142 stars 46 forks source link

Cannot declare and define an integer array using the colon operator #1404

Open bgoodri opened 7 years ago

bgoodri commented 7 years ago

Summary:

It is a parser error to declare and define a one-dimensional integer array using the colon operator.

Description:

It is a parser error to declare and define a one-dimensional integer array using the colon operator.

Reproducible Steps:

Try to parse the following Stan program

data {
  int<lower=1> K;
}
transformed data {
  int arr[K] = 1:K;
}

Current Output:

SYNTAX ERROR, MESSAGE(S) FROM PARSER:

variable definition dimensions mismatch, definition specifies 1, declaration specifies 0  error in 'model_int_array' at line 5, column 17
  -------------------------------------------------
     3: }
     4: transformed data {
     5:   int arr[K] = 1:K;
                        ^
     6: }
  -------------------------------------------------

Error in stanc(model_code = paste(program, collapse = "\n"), model_name = model_cppname,  : 
  failed to parse Stan model 'int_array' due to the above error.

Expected Output:

Nothing. It should assign the integers from 1 to K to arr

Additional Information:

Current Version:

v2.17.0

bob-carpenter commented 7 years ago

This sounds like it'd be reasonable to add. It shouldn't be too hard. : will have to be defined as an infix operator.

What should its precedence be? It would make sense to me to allow 1+2:3 + 9 with interpretation (1 + 2):(3 + 9)? R goes the other way (for all arithmetic), making : bind more tightly:

> 1+2:3+4
[1] 7 8

Associativity doesn't matter since it's output isn't the same type as its input.

bgoodri commented 7 years ago

I guess it should be whatever it is for x[1 + 2:3 + 9].

On Sat, Oct 21, 2017 at 7:13 PM, Bob Carpenter notifications@github.com wrote:

This sounds like it'd be reasonable to add. It shouldn't be too hard. : will have to be defined as an infix operator.

What should its precedence be? It would make sense to me to allow 1+2:3 + 9 with interpretation (1 + 2):(3 + 9)? R goes the other way (for all arithmetic), making : bind more tightly:

1+2:3+4 [1] 7 8

Associativity doesn't matter since it's output isn't the same type as its input.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/stan-dev/stanc3/issues/1404, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrql7_AYgLU3LB7Przg6z2en1Ar9F-ks5sunqYgaJpZM4QBvjF .

bob-carpenter commented 7 years ago

Excellent point. Current Stan treats : as binding more loosely than arithmetic. So we get 1 + 2 : 3 + 9 with the same denotation as 3 : 12 in both of the existing contexts:

  for (i in 1 + 2 : 3 + 9) ...

and

x[1+ 2 : 3 + 9]
VMatthijs commented 5 years ago

This is difficult to add to Stan currently as it would lead to ambiguity. For example, is the following a valid Stan program if we add this behaviour

model {
  int n = 5
  for (i in 1:n) {
    n += 1;
  }
}

?

Observe that the for could then either be interpreted as a for-loop over the range 1:n or a foreach-loop over the array 1:n. Both have slightly different operational semantics. Because our for-loops are like C for-loops, the upper bound gets reevaluated after each pass through the loop body. In particular, for-loops might not terminate. For foreach-loops, the array gets evaluated once before we enter the loop and then becomes immutable.

bob-carpenter commented 5 years ago

This program is currently valid and will evaluate the loop body for i in { 1, 3, 5 }.

With "1:n" this would be ambiguous between a foreach loop where "1:n" denotes an int[] and a regular loop where "1:n" are just the loop bounds.

This is a behavior we could change in Stan 3. I believe Matthijs has suggested making the loop bounds immutable in the body, so it's effectively the foreach interpretation.

his could get tricky syntactically if the bound is a function, say

for (i in 1:f(x,y,z))

We presumably wouldn't want any of x, y, z being mutated in the loop body. This would settle on the foreach s

On Dec 13, 2018, at 8:03 AM, Matthijs Vákár notifications@github.com wrote:

This is difficult to add to Stan currently as it would lead to ambiguity. For example, is the following a valid Stan program if we add this behaviour

model { int n = 5 for (i in 1:n) { n += 1; } }

?

Observe that the for could then either be interpreted as a for-loop over the range 1:n or a foreach-loop over the array 1:n. Both have slightly different operational semantics. Because our for-loops are like C for-loops, the upper bound gets reevaluated after each pass through the loop body. In particular, for-loops might not terminate. For foreach-loops, the array gets evaluated once before we enter the loop and then becomes immutable.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

bob-carpenter commented 9 months ago

I would like to second this feature request. I'd like to use it!

WardBrian commented 9 months ago

There is still the ambiguity issue in for (i in 1:n) that makes this unable to be implemented in current Stan without either a differentiating syntax or a breaking change to some code.

bob-carpenter commented 9 months ago

If we just always parse 1:n as denoting an array of integers, then the iteration should work. Are there cases where this doesn't work?

WardBrian commented 9 months ago

Right now 1:n is a range, and you can modify n inside the loop, modifying the range. Treating it as an array changes the semantics when n is modified inside the loop

bob-carpenter commented 9 months ago

you can modify n inside the loop

Maybe it's too late, but I don't think we should allow that. If you want to do that you can build it by hand as:

int n = 1;
while (n <= N) {
  ...this code can modify n...;
  n += 1;
}

I'm somehow OK with break and continue in for loops, but somehow modifying n seems too confusing.

WardBrian commented 9 months ago

The issue is that it has been allowed for years (possibly since the beginning)?

It’s not necessarily documented or encouraged behavior, of course, since it’s better to use a while loop in that case