stan-dev / stanc3

The Stan transpiler (from Stan to C++ and beyond).
BSD 3-Clause "New" or "Revised" License
140 stars 44 forks source link

Feature: Tuple unpacking #1360

Closed WardBrian closed 1 year ago

WardBrian commented 1 year ago

Closes #1349.

Two notes on implementation:

  1. For the moment, this is desugared before the MIR layer. This is nice because it keeps the implementation simpler, but a bit annoying because it means we can't use std::tie in cases where that would be the fastest option. This can of course be changed later.
  2. Parsing these kinds of values on the left hand side of an = proved very difficult. I updated the grammar such that we no longer distinguish (in the BNF grammar of the language) between expressions that can be on the left hand side of an assignment and other expressions. We still make this distinction in the AST by checking it as part of the semantic action for the assignment parse rule. This actually dramatically improved the state of parser.messages, since we had duplicate errors for states containing a lhs and a (non-lhs) expression. I think this is probably fine (we should probably go even further in this direction -- see #614). Something very similar was done in ANSI C, which is why ++foo = 4 is a type error in C, not a parse error.

Submission Checklist

Release notes

Added the ability to unpack a tuple during an assignment. For example, the following is now valid:

parameters {
  matrix[N, M] A;
}
model {
  matrix[N, M] Q;
  matrix[M, M] R;

  (Q, R) = qr_thin(A); // qr_thin returns a tuple(matrix, matrix)
}

Copyright and Licensing

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the BSD 3-clause license (https://opensource.org/licenses/BSD-3-Clause)

codecov[bot] commented 1 year ago

Codecov Report

Merging #1360 (7b31b39) into master (fa6be01) will increase coverage by 0.01%. The diff coverage is 89.70%.

@@            Coverage Diff             @@
##           master    #1360      +/-   ##
==========================================
+ Coverage   89.44%   89.45%   +0.01%     
==========================================
  Files          65       65              
  Lines       10648    10774     +126     
==========================================
+ Hits         9524     9638     +114     
- Misses       1124     1136      +12     
Files Coverage Δ
src/frontend/Pretty_printing.ml 91.48% <100.00%> (+0.39%) :arrow_up:
src/frontend/Semantic_error.ml 91.98% <100.00%> (+0.39%) :arrow_up:
src/frontend/Deprecation_removals.ml 90.90% <88.88%> (-1.78%) :arrow_down:
src/frontend/Ast_to_Mir.ml 94.03% <92.50%> (-0.22%) :arrow_down:
src/frontend/Canonicalize.ml 95.97% <40.00%> (-1.96%) :arrow_down:
src/frontend/Ast.ml 57.08% <82.75%> (+0.67%) :arrow_up:
src/frontend/Typechecker.ml 91.47% <91.42%> (+0.36%) :arrow_up:
WardBrian commented 1 year ago

Note: there is a further optional part of the design doc pertaining to having _ as the equivalent of something like C++'s std::ignore. I chose not to implement that (for now) since it requires complicating the typechecking (_ would either need to have a new anything-goes type, or we'd need type information to flow more bidirectionally)

nhuurre commented 1 year ago

I refactored the code a bit to "make illegal states unrepresentable" (and also makes _ easier), see https://github.com/nhuurre/stanc3/commit/850392fc26402b0341bb8cd23ca090d8f0e973d7

How do you feel about a more precise uniqueness check that would allow

vector[5] x = ...;
(x[1], x[2:]) = (x[5], x[:4]);

Implemented on https://github.com/nhuurre/stanc3/commit/523c931ab5b20edd61543a40a504443575c023b7

Also, we need to decide what to do about

int x = 1;
vector[2] y = [1,1]';
(x, y[x]) = (2,2);
WardBrian commented 1 year ago

I definitely like the illegal states refactor, I'll pull that in.

The extra precision on the uniqueness check seems nice but also a lot of code for a relative edge case. IMO, since it would be backwards compatible to allow more cases later, I think we should stick to the simpler cases now and see if there is a demand.

That's a nasty test case. I think we'd either want to disallow it or give it the semantics that the values on the left hand side are all what they are before any assignments (such that it would be equivalent to x=2; y[1] = 2, not y[2] = 2), but that's the opposite of what currently happens. Disallowing it is probably easier?

nhuurre commented 1 year ago

I'm also in favor of disallowing it but note that currently this compiles:

array[2] int x = {1,2};
x[x[1]] = 2;

It's rather silly though, so maybe disallowing that too is fine.

WardBrian commented 1 year ago

I think that has the "reasonable" semantics, which is less obvious in the tuples case where the assignments are supposed to be simultaneous

WardBrian commented 1 year ago

@nhuurre I'm not entirely happy with having a loc directly inside the LTuplePack variant, since we don't really have that metadata as a direct part of the type anywhere else.

I was able to remove it in https://github.com/stan-dev/stanc3/commit/145cc345184a254ab87bdb7f1adb40a1884769f8 by re-constructing the tuple packing location as merging all of the individuals, but that misses the opening and closing ( ). I was worried this would negatively impact the comment printing (if they were inside those), and it does, but it turns out that the current state of this pr is worse (compiler fails):

transformed data {
  int x = 3;
  int y = 4;
  print(x, y);
  (/* test */x, y) = (5, 6);

Can you think of both a way to fix this error and maybe also remove the loc part?

WardBrian commented 1 year ago

Current failures are actually due to that loc not being marked as ignore for comparison. Fixing that, the result of formatting

transformed data {
  int x = 3;
  int y = 4;
  print(x, y);
  (x, y) = (5, 6);
  print(x, y);
  (x, y) = (y, x);
  print(x, y);
  (/* comment 1*/x/*comment 2*/,/*comment 3*/y/* comment 4 */)/*comment 5*/ = (y, x);
  (//comment 1
  x //comment 2
  , //comment 3
  y //comment 4
   ) // comment 5
    = (y, x);
}

is now

transformed data {
  int x = 3;
  int y = 4;
  print(x, y);
  (x, y) = (5, 6);
  print(x, y);
  (x, y) = (y, x);
  print(x, y);
  (/* comment 1*/ x /*comment 2*/, /*comment 3*/ y /* comment 4 */) = (y, x);

  /* ^^^:comment 5*/
  (//comment 1
   x //comment 2
   , //comment 3
   y //comment 4
   ) = (y, x);

  // ^^^: comment 5
}
nhuurre commented 1 year ago

I'm not a fan of that loc either but it's not possible to reliably reconstruct the location information, especially if we later add a _ without loc. Every expression and statement does have metadata loc anyway even if it's usually more indirect. And I totally forgot it needs a [@compare.ignore].

The only problem with that auto-format result is misplaced comment 5, right? I think adding = as a Separator in the lexer and printing = as if it were a binary operator should fix that; but also, auto-formatting is known to misplace comments in unexpected positions so IMO we could just say that specific placement is not supported.

WardBrian commented 1 year ago

Yes, 5 is the only bad one currently. If we remove the loc others get worse (1 and 4)

Fair enough, I think it's fine to leave now that it has the ignore set

WardBrian commented 1 year ago

I changed LTuplePacking to hold a record {lvals; loc} rather than a tuple, so it looks/feels more like the other types in the AST.

I think this is good for a first implementation, I'm going to look more into _ soon