lutaml / expressir

Ruby parser for the ISO EXPRESS language
3 stars 3 forks source link

Finalize parsing and formatting remark tags #30

Closed zakjan closed 3 years ago

zakjan commented 3 years ago

According to the ISO 10303-11 spec, 7.1.6.3 b) Remark tag

A qualified remark reference shall use the visibility rules defined in 10.2 in the following manner: the reference to the left of the ’.’ shall identify the scope in which to find the reference to the right.

Based on https://github.com/lutaml/expressir/issues/11#issuecomment-730019292, all remarks shall use a qualified remark reference, therefore a scope is required in the remark tag.

It means the following further changes needed in remark parsing and formatting:

1) remark tag without a scope shall not be parsed

Currently remark tags without a scope are parsed and attached to the tree based on lexical scope. These should be considered invalid.

2) remark tag shall be formatted with a scope at all levels

Currently remark tags are not formatted with a scope in a schema scope. These should format with a scope also in a schema scope. (?)

zakjan commented 3 years ago

@TRThurman Could you explain why this line https://github.com/metanorma/annotated-express/blob/master/data/resources/aic_curve_swept_solid/aic_curve_swept_solid.exp#L112 contains a remark tag without a scope curve_swept_solid_shape_representation? Based on your comment linked above, I would understand that the scope must be always specified, therefore it would be aic_curve_swept_solid.curve_swept_solid_shape_representation.

Or is lexical scope (i.e. placing a remark inside a schema declaration in a text file) still somehow taken into account?

ronaldtse commented 3 years ago

@zakjan that anomaly could be due to me not fully understanding the structure back then. In this case please help update that repo to reflect latest agreed practice 👍

ronaldtse commented 3 years ago

Will let @TRThurman clarify. Thanks.

TRThurman commented 3 years ago

@TRThurman https://github.com/TRThurman Could you explain why this line https://github.com/metanorma/annotated-express/blob/master/data/resources/aic_curve_swept_solid/aic_curve_swept_solid.exp#L112 https://github.com/metanorma/annotated-express/blob/master/data/resources/aic_curve_swept_solid/aic_curve_swept_solid.exp#L112 contains a remark tag without a scope curve_swept_solid_shape_representation? Based on your comment linked above, I would understand that the scope must be always specified, therefore it would be aic_curve_swept_solid.curve_swept_solid_shape_representation.

The remark tag is not always qualified. Specifically remark tags for SCHEMA, declarations directly within the scope of a SCHEMA do not need to be qualified with the SCHEMA identifier. I updated the original comment with a correction. Or is lexical scope (i.e. placing a remark inside a schema declaration in a text file) still somehow taken into account?

zakjan commented 3 years ago

Ok, would this be the preferred solution then?

TRThurman commented 3 years ago

Ok, would this be the preferred solution then?

when parsing, lexical scope and visibility rules shall be supported Yes when formatting to canonical output, tagged remarks shall be printed in the end of the file using a qualified reference including schema scope What is the interpretation of 'canonical output'?

ronaldtse commented 3 years ago

What is the interpretation of 'canonical output'?

The eengine --pretty output. We use its output to check against our EXPRESS output, and since eengine does not prettify remark tags, we format the tail/embedded remarks according to similar rules.

TRThurman commented 3 years ago

So essentially you would add the collection of remark tags to the end of the --pretty file. Makes sense. thanks!

zakjan commented 3 years ago

Yeah almost, it's not exactly eengine --pretty file, but our own version of it with the same purpose.

Before I printed the tagged remarks into the lexical scope. Now I'm going to change it to print them to the end, to follow the preferred formatting.

zakjan commented 3 years ago

There are three edge cases: alias statement (10.3.1), query expression (10.3.9) and repeat statement (10.3.10).

They declare implicit variables which are visible only in the inner lexical scope and are not visible from outside. These still needs to be printed to the lexical scope, correct?

TRThurman commented 3 years ago

There are three edge cases: alias statement (10.3.1), query expression (10.3.9) and repeat statement (10.3.10).

They declare implicit variables which are visible only in the inner lexical scope and are not visible from outside. These still needs to be printed to the lexical scope, correct?

Yes, there is no mechanism to reference them.

zakjan commented 3 years ago

This is the updated test output file: https://github.com/lutaml/expressir/blob/formatter3/original/examples/syntax/remark_formatted.exp

Would you mind reviewing it for the last time, please?

TRThurman commented 3 years ago

In some cases, the test schema is unrealistic. Example:

DERIVE remark_derived_attribute : STRING := 'xxx'; INVERSE remark_inverse_attribute : remark_entity FOR remark_attribute;

There might be:

DERIVE remark_derived_attribute : STRING := 'xxx'; another att

INVERSE remark_inverse_attribute : remark_entity FOR remark_attribute; yet another attribute.

otherwise looks ok.

zakjan commented 3 years ago

This was indeed a bug, I handled printing multiple derived or inverse attributes incorrectly. Good catch! Fixed and merged.