ricosjp / ruststep

A STEP toolkit for Rust
Apache License 2.0
135 stars 13 forks source link

Case sensitivity in EXPRESS schema #54

Open termoshtt opened 3 years ago

termoshtt commented 3 years ago

As far as I read ISO-10303-11, user defined identifiers should be snake_case, but some identifier copied into schema/ does contains non snake_case identifier like degree_Celsius

g-rauhoeft commented 3 years ago

ISO-10303-11 actually says the following about case sensitivity:

EXPRESS may be written using upper, lower or mixed case letters

and

The case of letters is significant only within explicit string literals

As such, restricting any identifiers to lower case goes against the specification. I attempted to change it within the letter parser, but it causes a plethora of unit test failures. I'm still attempting to find out why:

pub fn letter(input: &str) -> RawParseResult<char> { satisfy(|c| matches!(c, 'A'..='Z' | 'a'..='z')).parse(input) }

This might be the wrong approach though, as the specification clearly says that we can disregard the case of anything except for string literals. Perhaps we should transform any inputs except string literals to lower case by default.

g-rauhoeft commented 3 years ago

I believe I've found the cause of the test failures if mixed case is allowed:

For blocks without an END tag, such as UNIQUE, the parser swallows the END tag of the parent block, because it doesn't know how to differentiate between the end of the UNIQUE block and its contents. For this reason I suggest creating either an until_tag combinator, that consumes the input until an END_ tag is reached, or an is_not_reserved combinator, that wraps a nom::combinator::cond combinator, checking whether an identifier is actually a keyword reserved by EXPRESS.

I'm currently working on the latter and hope to make a pull request early next week.

g-rauhoeft commented 3 years ago

I've started implementing the changes under: https://github.com/g-rauhoeft/ruststep/tree/case-sensitivity-fix

Please let me know if you feel this is going in the right direction. I haven't created a pull request yet, because some test cases still fail, but a vast majority are successful.

termoshtt commented 3 years ago

https://github.com/ricosjp/ruststep/pull/66#issuecomment-868471563

6.1 The syntax of the specification refers to keywords being in uppercase, but that is only for the syntax of the specification in WSN, not for EXPRESS as such.

This looks true.

As such, restricting any identifiers to lower case goes against the specification.

I agree. We should accept non-lower case identifiers.

or an is_not_reserved combinator,

Detecting reserved keyword in simple_id has been implemented in #87, and improved in #117

For this reason I suggest creating either an untiltag combinator, that consumes the input until an END tag is reached,

The many_till parser in #66 has been merged though #114, but not used in esprc.