Tokens could be defined directly in nif.xml which would help with simplification of writing and parsing expressions or composing strings in various attributes, while improving readability and adding context.
Right now the logical and arithmetic operators are used in nif.xml without any description. Also, various expressions used in vercond can be very complex to write, and harder to decipher. They can also be repeatedly used, in the case of custom versions (Bethesda and others). Tokens defined by the XML itself would give these expressions context and readability. Tokenizing the expressions with repeated usage would increase maintainability as the expression string would be in only one location.
Tokenizing operators would also alleviate issues with parsing, reading, and writing (typing) XML entities such as &> and <. It would also remove ambiguity between &/&& and |/|| which is currently an issue that parsers have to deal with specially.
Example:
<operator token="#ADD#" string="+" />
<operator token="#SUB#" string="-" />
<operator token="#MUL#" string="*" />
<operator token="#DIV#" string="/" />
<operator token="#AND#" string="&&" />
<operator token="#OR#" string="||" />
<operator token="#LT#" string="<" />
<operator token="#GT#" string=">" />
<operator token="#LTE#" string="<=" />
<operator token="#GTE#" string=">=" />
<operator token="#EQ#" string="==" />
<operator token="#NEQ#" string="!=" />
<operator token="#BITAND#" string="&" />
<operator token="#BITOR#" string="|" />
<!-- Note: The strings for expression tokens may need to use the operator tokens. -->
<expression token="#BS202#" string="((Version == 20.2.0.7) && (User Version 2 > 0))" />
<expression token="#BSSTREAM#" string="(User Version 2 > 0)" />
<expression token="#NISTREAM#" string="(User Version 2 == 0)" />
<expression token="#DIVINITY2#" string="((User Version == 0x20000) || (User Version == 0x30000))" />
<expression token="#SSE#" string="(User Version 2 == 100)" />
<expression token="#FO4#" string="(User Version 2 == 130)" />
Parsers would have two main ways of dealing with tokens, as first-class entities (ignoring the string attr and dealing with the tokens directly), or as second-class entities (using string contents to replace the token and dealing with the strings like before).
Operators are best dealt with as first-class, otherwise you eliminate the benefits of tokens for both regex and non-regex parsing. Other token types might be best dealt with as basic string replacement.
Note: #FLT_MIN# and #INV_FLT# have the same string value but provide different context for the usage of the value. One is the negation of #FLT_MAX# regarding start/end time in a sequence, and the other is denoting an invalid uninitialized value.
Real world example: These defaults caused a compiler error in niflib when I accidentally included a trailing comma in one of them. This would not have happened if I had been using tokens.
Forthcoming block versioning (will get separate ticket)
<!-- Note: Space-separated is the correct way to make lists in XML -->
<!-- Note: VXX_X_X_X is a unique ID for <version> being added to the spec, will receive ticket. -->
<versionset token="#BETHESDA#" string="V10_0_1_2 V10_1_0_101 V10_1_0_106 V10_2_0_0__10 V20_0_0_4__10 V20_0_0_4__11 V20_0_0_5_OBL V20_2_0_7__11_1 V20_2_0_7__11_2 V20_2_0_7__11_3 V20_2_0_7__11_4 V20_2_0_7__11_5 V20_2_0_7__11_6 V20_2_0_7__11_7 V20_2_0_7__11_8 V20_2_0_7_FO3 V20_2_0_7_SKY V20_2_0_7_SSE V20_2_0_7_FO4" />
<versionset token="#FO3#" string="V20_0_0_4__11 V20_2_0_7__11_1 V20_2_0_7__11_2 V20_2_0_7__11_3 V20_2_0_7__11_4 V20_2_0_7__11_5 V20_2_0_7__11_6 V20_2_0_7__11_7 V20_2_0_7__11_8 V20_2_0_7_FO3" />
<versionset token="#SSE#" string="V20_2_0_7_SSE" />
<versionset token="#FO4#" string="V20_2_0_7_FO4" />
<niobject name="bhkRefObject" versions="#BETHESDA#">
<niobject name="bhkSerializable" versions="#BETHESDA#">
<niobject name="bhkWorldObject" versions="#BETHESDA#">
<niobject name="bhkEntity" versions="#BETHESDA#">
<niobject name="bhkPoseArray" versions="#FO3#" />
<niobject name="bhkPhysicsSystem" versions="#FO4#" />
In brief, since it will be discussed in its own ticket, block versioning being reintroduced has run into issues with the granularity of since/until in that for custom blocks it is not good enough and would require vercond expressions, which should be avoided at all costs. Instead each version can be listed (<version> now being every distinct format including user and Bethesda versions) but this gets extremely repetitious and needs a system of abbreviation.
For Discussion
Delimiter
Is # the best delimiter? There is also @. Neither have any uses in logic/arithmetic or occur commonly in the strings we use in nif.xml. Especially not an issue since the delimiter goes at each end, which I think is required to eliminate any token containing another e.g. #LT and #LTE collide.
Organization
Do we put all token elements nested under a <tokens>? Or do we leave them flat alongside <version>, <basic>, etc. I feel grouping them is important as it makes sure an XML parser can grab every token at once without needing to know tag names.
Element tag names
Do we use multiple tag names to specify/limit the token usage, i.e. contexts?
Contexts
Separate tag name and contexts would require separate maps/dictionaries on the parser side. There are already 4 example contexts (operator, expression, default, versionset). Currently the example specification does not include any built-in way of specifying what tags and attributes a token's context is limited to, so the association with tag name -> attributes to parse using it would have to be manual.
Reusing token identifiers in differing contexts
Note that I use #FO4# and #SSE# in both <versionset> and <expression>. Given that these tags are used in completely different domains, is this OK? Reuse or not, contexts means that technically a parser can no longer do a naive search/replace on the entire file in memory before reading the XML in. Because regardless of identifier collisions, a search/replace violates the limitation on association of a token tag to a specific element attribute.
Token string attributes
Expression token strings will likely need to use the operator tokens, because first-class token parsers would no longer know about &&, et al.
Also if a parser is treating tokens as second-class, this means that order matters as all <operator> tags need to be read in first. Then as each <expression> tag is read in, the parser will replace the tokens with the operator strings.
Tokens could be defined directly in nif.xml which would help with simplification of writing and parsing expressions or composing strings in various attributes, while improving readability and adding context.
Right now the logical and arithmetic operators are used in nif.xml without any description. Also, various expressions used in
vercond
can be very complex to write, and harder to decipher. They can also be repeatedly used, in the case of custom versions (Bethesda and others). Tokens defined by the XML itself would give these expressions context and readability. Tokenizing the expressions with repeated usage would increase maintainability as the expression string would be in only one location.Tokenizing operators would also alleviate issues with parsing, reading, and writing (typing) XML entities such as
&
>
and<
. It would also remove ambiguity between&
/&&
and|
/||
which is currently an issue that parsers have to deal with specially.Example:
Parsers would have two main ways of dealing with tokens, as first-class entities (ignoring the
string
attr and dealing with the tokens directly), or as second-class entities (usingstring
contents to replace the token and dealing with the strings like before).Operators are best dealt with as first-class, otherwise you eliminate the benefits of tokens for both regex and non-regex parsing. Other token types might be best dealt with as basic string replacement.
Other potential usages:
Commonly used default values
Note:
#FLT_MIN#
and#INV_FLT#
have the same string value but provide different context for the usage of the value. One is the negation of#FLT_MAX#
regarding start/end time in a sequence, and the other is denoting an invalid uninitialized value.Real world example: These defaults caused a compiler error in niflib when I accidentally included a trailing comma in one of them. This would not have happened if I had been using tokens.
Forthcoming block versioning (will get separate ticket)
In brief, since it will be discussed in its own ticket, block versioning being reintroduced has run into issues with the granularity of
since
/until
in that for custom blocks it is not good enough and would require vercond expressions, which should be avoided at all costs. Instead each version can be listed (<version>
now being every distinct format including user and Bethesda versions) but this gets extremely repetitious and needs a system of abbreviation.For Discussion
Delimiter
Is
#
the best delimiter? There is also@
. Neither have any uses in logic/arithmetic or occur commonly in the strings we use in nif.xml. Especially not an issue since the delimiter goes at each end, which I think is required to eliminate any token containing another e.g.#LT
and#LTE
collide.Organization
Do we put all token elements nested under a
<tokens>
? Or do we leave them flat alongside<version>
,<basic>
, etc. I feel grouping them is important as it makes sure an XML parser can grab every token at once without needing to know tag names.Element tag names
Do we use multiple tag names to specify/limit the token usage, i.e. contexts?
Contexts
Separate tag name and contexts would require separate maps/dictionaries on the parser side. There are already 4 example contexts (operator, expression, default, versionset). Currently the example specification does not include any built-in way of specifying what tags and attributes a token's context is limited to, so the association with tag name -> attributes to parse using it would have to be manual.
Reusing token identifiers in differing contexts
Note that I use
#FO4#
and#SSE#
in both<versionset>
and<expression>
. Given that these tags are used in completely different domains, is this OK? Reuse or not, contexts means that technically a parser can no longer do a naive search/replace on the entire file in memory before reading the XML in. Because regardless of identifier collisions, a search/replace violates the limitation on association of a token tag to a specific element attribute.Token string attributes
Expression token strings will likely need to use the operator tokens, because first-class token parsers would no longer know about
&&
, et al.Also if a parser is treating tokens as second-class, this means that order matters as all
<operator>
tags need to be read in first. Then as each<expression>
tag is read in, the parser will replace the tokens with the operator strings.