LangRef is out of sync with the codebase

andrew-wja commented 4 years ago

I'm working on a backend for an external tool that generates MLIR in the generic textual form. I've been following the LangRef to see how this should be done, and I found a few trivial problems (fixed in https://github.com/tensorflow/mlir/pull/265). However, I've now run across some more serious differences that I don't feel comfortable changing/that might require attention from someone who is familiar with the codebase.

unnamed blocks and empty regions

These are handled differently in the actual parser lib/Parser/Parser.cpp from how the LangRef says they are interpreted. In the code, Regions can be empty, and blocks can be unlabelled. In the LangRef, Regions may not be empty, and blocks must always be labelled.

Proposed new LangRef syntax (matching what I think the parsing logic actually says):

block ::= bb-label? operation*

region ::= `{` block* `}`

non-function-type

non-function-type is used without being defined in the LangRef and it seems like it might be non-trivial to define.

ftynse commented 4 years ago

Regions can be empty,

This sounds correct.

and blocks can be unlabelled.

AFAIK, only the entry block of the region can omit block label because one is not allowed to branch to it anyway. Do you have an example where it would work for other blocks? Otherwise, this should be fixed at the region definition level to something like headless-block block* | block*.

it seems like it might be non-trivial to define.

It is not, actually. standard-type should be split into function-type and the rest, which is a standard-non-function-type. Then function-type ::= standard-non-function-type | dialect-type | type-alias. However, this is related to a deeper problem of a potential ambiguity between affine maps and function types, and may change in the future.

andrew-wja commented 4 years ago

Regions can be empty,

This sounds correct.

and blocks can be unlabelled.

AFAIK, only the entry block of the region can omit block label because one is not allowed to branch to it anyway. Do you have an example where it would work for other blocks? Otherwise, this should be fixed at the region definition level to something like headless-block block* | block*.

I updated the top post to clarify that the LangRef says regions may not be empty, and that blocks must always be labelled, sorry for the unclear language here! Agreed, something like headless-block block* | block* looks fine. The important thing (IMO) is to encode the "only the entry block" behaviour in the LangRef.

it seems like it might be non-trivial to define.

It is not, actually. standard-type should be split into function-type and the rest, which is a standard-non-function-type. Then function-type ::= standard-non-function-type | dialect-type | type-alias. However, this is related to a deeper problem of a potential ambiguity between affine maps and function types, and may change in the future.

Thanks for the explanation! I will have a look at the actual parser code and see if there are some comments with EBNF that could be added to the LangRef to document this structure.

tensorflow / mlir

LangRef is out of sync with the codebase #287

unnamed blocks and empty regions

non-function-type