[Migrated from JIRA ATOM-559]
During the previous refactoring of December 11th (unification of the database of symbols), I judged that introducing pointers to the original AST nodes that served the construction of the intermediate representation (IR) would help to reduce the code needed in the IR.
Example today in VarInfo because we have the AstDeclNode* we can have a query function HasStorageFlag(Flag flag) that directly goes fishing in the AST node for the tokens declaring the qualifiers. This way, we don't have redundancy of information, keeping the IR structure leaner.
Second advantage from this design, is access to the original source location; so better ability to quote the actual user code, both for error messages or for code emission.
Third advantage, is in the case that the IR is incomplete, the components (clients) that use the IR (code emission) can reconstruct information that hasn't been prepared in the semantic pass.
It enables a bit more painless (future-opened) back-end clients data feeding.
BUT.
The problem we see arising from this decision today, is that we have a tight coupling between AST and IR.
The IR is incomplete without the pointers to some Ast nodes, and the pointers being set (not null), is a program invariant. (for the sake of "if-less programming", client code don't have to contort against optionally set pointers. ifless programming ensures canonical treatment, no special case, and neatly unbloated code. The objective metric measuring that is called cyclomatic complexity.)
This tight coupling is undesirable today, because of code constructs that we want to emulate to behave like another canonical construct. The problem, is that we can't; since there is no syntax that has been instantiated, to support the virtual construct.
Example:
ShaderResourceGroupSemantic stuff
{ FrequencyId = 1; }
;
This is the syntax we require; but internally we want to treat this as:
ShaderResourceGroupSemantic stuff
{ static const intrinsicattribute int FrequencyId = 1; }
;
That way, the FrequencyId can be stored in the IR as a VarInfo, no need to create a new Kind; nor a new semantic validation, nor a new IR subkind, nor extend the variant of possible subkinds, nor go in all client of the IR, to update the support for this new subkind.
Basically, by doing this virtual syntax alteration, we get all features down the chain, to work out of the box; without increasing the source-line count of the compiler.
But today, because of this tight coupling, we have an impossibility to virtualize IR construction from procedural code. In other words, we can't have "generated IR" without an original parsed syntax. And that is a big problem.
I am to blame for this original decision so I should cleanup this coupling.
That means: Augmenting the IR data model to store any information that is today retrieved by the clients doing their own custom Ast analysis.
And removing all pointers from the IR, storing information important for error reporting, and code emission that are now accessed through these pointers (like line number, often).
We emit a lot of code today in a dumb way of just iterating though the tokens, pointing to string pieces that exist in the original source stream.
Decoupling thus involves being able to do code emission at a much finer grained level, like reconstructing all language expressions, statements and structural constructions (if, while..) from IR data, and not from tokens.
This is unfortunately a fairly big amount of work.
Actually this isn't really necessary outside of a sentiment of purity. If a pragmatic need emerges, new specification in this direction will naturally be recreated.
[Migrated from JIRA ATOM-559] During the previous refactoring of December 11th (unification of the database of symbols), I judged that introducing pointers to the original AST nodes that served the construction of the intermediate representation (IR) would help to reduce the code needed in the IR.
Example today in VarInfo because we have the AstDeclNode* we can have a query function
HasStorageFlag(Flag flag)
that directly goes fishing in the AST node for the tokens declaring the qualifiers. This way, we don't have redundancy of information, keeping the IR structure leaner.Second advantage from this design, is access to the original source location; so better ability to quote the actual user code, both for error messages or for code emission.
Third advantage, is in the case that the IR is incomplete, the components (clients) that use the IR (code emission) can reconstruct information that hasn't been prepared in the semantic pass.
It enables a bit more painless (future-opened) back-end clients data feeding.
BUT.
The problem we see arising from this decision today, is that we have a tight coupling between AST and IR.
The IR is incomplete without the pointers to some Ast nodes, and the pointers being set (not null), is a program invariant. (for the sake of "if-less programming", client code don't have to contort against optionally set pointers. ifless programming ensures canonical treatment, no special case, and neatly unbloated code. The objective metric measuring that is called cyclomatic complexity.)
This tight coupling is undesirable today, because of code constructs that we want to emulate to behave like another canonical construct. The problem, is that we can't; since there is no syntax that has been instantiated, to support the virtual construct.
Example:
ShaderResourceGroupSemantic stuff
{ FrequencyId = 1; }
;
This is the syntax we require; but internally we want to treat this as:
ShaderResourceGroupSemantic stuff
{ static const intrinsicattribute int FrequencyId = 1; }
;
That way, the FrequencyId can be stored in the IR as a VarInfo, no need to create a new Kind; nor a new semantic validation, nor a new IR subkind, nor extend the variant of possible subkinds, nor go in all client of the IR, to update the support for this new subkind.
Basically, by doing this virtual syntax alteration, we get all features down the chain, to work out of the box; without increasing the source-line count of the compiler.
But today, because of this tight coupling, we have an impossibility to virtualize IR construction from procedural code. In other words, we can't have "generated IR" without an original parsed syntax. And that is a big problem.
I am to blame for this original decision so I should cleanup this coupling.
That means: Augmenting the IR data model to store any information that is today retrieved by the clients doing their own custom Ast analysis.
And removing all pointers from the IR, storing information important for error reporting, and code emission that are now accessed through these pointers (like line number, often).
We emit a lot of code today in a dumb way of just iterating though the tokens, pointing to string pieces that exist in the original source stream.
Decoupling thus involves being able to do code emission at a much finer grained level, like reconstructing all language expressions, statements and structural constructions (if, while..) from IR data, and not from tokens.
This is unfortunately a fairly big amount of work.