nawforce / apex-parser

Salesforce Apex language parser for Java and Javascript.
49 stars 16 forks source link

Name identifier lexed as "NAME" SOSL token #10

Closed aheber-doterra closed 3 years ago

aheber-doterra commented 3 years ago

I'm sure I'm doing this wrong, please educate me as needed.

The Lexing pass appears to incorrectly identify name identifiers as the NAME token type from the SOSL section. Not sure how to isolate those token to limit to certain sections. I have a test that demonstrates this but didn't have an obvious fix. I'm not great with ANTLR syntax.

Let me know if I'm not understand that token type correctly. I haven't tested similar tokens but expect they might have the same problem.

test("Name handling", () => {
  const lexer = new ApexLexer(
    new CaseInsensitiveInputStream("test.cls", "record.name")
  );
  const tokens = new CommonTokenStream(lexer);
  expect(tokens.getNumberOfOnChannelTokens()).toBe(4);
  const tList = tokens.getTokens();
  expect(tList[0].type).toBe(ApexLexer.Identifier);
  expect(tList[2].type).toBe(ApexLexer.Identifier);
});

Throws the error, 164 in this case is NAME

    Expected: 230
    Received: 164

      42 |   expect(tList[0].type).toBe(ApexLexer.Identifier);
      43 |   expect(tList[1].type).toBe(ApexLexer.DOT);
    > 44 |   expect(tList[2].type).toBe(ApexLexer.Identifier);
         |                         ^
      45 | });

ApexLexer.ts:

  public static readonly NAME = 164;
  public static readonly Identifier = 230;

An example of where I'm experiencing this in genuine code, accessing the name of a LoggingLevel enum:

  public Log setLogLevel(LoggingLevel level) {
    this.level = level;
    lw.logLevel = level.name();
    return this;
  }
nawforce commented 3 years ago

You will see this problem with most keywords. The lexer will classify them as unique tokens when in context they are just being used as identifiers. The grammar contains an 'id' rule that accepts those keyword tokens that are legal as identifiers (not all are) as well as identifier tokens, I use this in other grammar rules rather than looking for identifier tokens.

You can call on this rule directly to work out if a token is an identifier of either type, with something like:

ApexParser parser = new ApexParser(tokens);
ApexParser.IdContext context = parser.id();

The parser is designed to be quite lax about what it considers to be an identifier, essentially following old Java rules. There is code at Identifier.scala that does some additional checks to make sure they are really legal in Apex. I work this way just so I can give better error messages if you make a mistake, rather than you just getting an ANTLR error.

aheber-doterra commented 3 years ago

@nawforce thanks for the understanding! That'll help.

I'm playing with building my own lexer right now and was comparing output to apex-parser as a sanity check and thought it was odd that I couldn't get those to line up. (I haven't implemented the SOQL/SOSL keywords yet which is part of it)