jetstreamapp / soql-parser-js

Javascript SOQL parser
https://jetstreamapp.github.io/soql-parser-js/
MIT License
77 stars 20 forks source link

NOT (field expression) fails to parse #122

Closed stephen-peck closed 3 years ago

stephen-peck commented 3 years ago

Description

Fails using version 2.5.5

NOT ( ... ) to invert a field expression results in a parser error NoViableAltException

A simple example:

parseQuery("SELECT AnnualRevenue FROM Account WHERE NOT (AnnualRevenue > 0 AND AnnualRevenue < 200000)");

The same query parses ~correctly~ using version 1.2.1 (and the online parser which appears to be using an older version). However, the parsed output is incorrect, treating NOT as a logicalPrefix - when the output is composed, the query is rewritten incorrectly. ...

SELECT AnnualRevenue
FROM Account
WHERE (NOT AnnualRevenue > 0
    AND AnnualRevenue < 200000)

(Edited to clarify behaviour on older version)

stephen-peck commented 3 years ago

Looks like I had an old version of the online parser cached, after a refresh it now fails to parse the same way, with the following output (from NoViableAltException)

Expecting: one of these possible Token sequences:
  1. [AVG, L_PAREN]
  2. [COUNT, L_PAREN]
  3. [COUNT_DISTINCT, L_PAREN]
  4. [MIN, L_PAREN]
  5. [MAX, L_PAREN]
  6. [SUM, L_PAREN]
  7. [DISTANCE, L_PAREN]
  8. [CALENDAR_MONTH, L_PAREN]
  9. [CALENDAR_QUARTER, L_PAREN]
  10. [CALENDAR_YEAR, L_PAREN]
  11. [DAY_IN_MONTH, L_PAREN]
  12. [DAY_IN_WEEK, L_PAREN]
  13. [DAY_IN_YEAR, L_PAREN]
  14. [DAY_ONLY, L_PAREN]
  15. [FISCAL_MONTH, L_PAREN]
  16. [FISCAL_QUARTER, L_PAREN]
  17. [FISCAL_YEAR, L_PAREN]
  18. [HOUR_IN_DAY, L_PAREN]
  19. [WEEK_IN_MONTH, L_PAREN]
  20. [WEEK_IN_YEAR, L_PAREN]
  21. [FORMAT, L_PAREN]
  22. [toLabel, L_PAREN]
  23. [convertTimezone, L_PAREN]
  24. [convertCurrency, L_PAREN]
  25. [GROUPING, L_PAREN]
  26. [Identifier]
but found: '('
paustint commented 3 years ago

@stephen-peck - thank you for filing this issue. I will investigate this and hopefully get resolved soon!

paustint commented 3 years ago

@stephen-peck - I took a look at the code and was able to fix the parsing error, but there are some issues that I have not yet resolved. Even though my (unmerged) changes support parsing your query correctly, it is not able to compose the parsed query back to valid and identical SOQL so some test-cases are still failing.

The way that the data model is structured leaves some ambiguity with the logical prefix NOT. :sob:

I need to figure out a way to determine when NOT should be included in the parens or outside the parens, but there is still a case where NOT might need to be in the middle of the parans, which the data model is not setup to support.

When I wrote the parser I did not find many examples of using NOT and missed some of the basic cases (as you found). I will spend some more time reviewing and seeing what I can come up with and hopefully without a breaking change.

Example queries where the data model would be ambiguous after parsing:

https://github.com/paustint/soql-parser-js/pull/123

stephen-peck commented 3 years ago

Thanks for the quick response (and of course the parser itself!). Have you considered whether NOT could be treated as a logical operator (rather than prefix)? Alongside expr AND/OR expr handling the syntax NOT expr should disambiguate the above but looks like a breaking, or opt-into change. See Salesforce logical operators

I didn't get very far investigating this, sorry, but was looking along the lines of:

@ -230,10 +230,11 @@ export class SoqlParser extends CstParser {
       parenCount = this.getParenCount(parenCount);
       this.OPTION(() => {
         this.OR([
           { ALT: () => this.CONSUME(lexer.And, { LABEL: 'logicalOperator' }) },
           { ALT: () => this.CONSUME(lexer.Or, { LABEL: 'logicalOperator' }) },
+          { ALT: () => this.CONSUME(lexer.Not, { LABEL: 'logicalOperator' }) },
         ]);
       });
       this.SUBRULE(this.expression, { ARGS: [parenCount, allowSubquery, alowAggregateFn, allowLocationFn] });
     },
   );
@@ -485,14 +486,10 @@ export class SoqlParser extends CstParser {
             parenCount.left++;
           }
         });
       });

-      this.OPTION1(() => {
-        this.CONSUME(lexer.Not, { LABEL: 'logicalPrefix' });
-      });
-
paustint commented 3 years ago

@stephen-peck - yeah, I was thinking the same thing when I was investigating and saw the docs. I think it is probably worth a breaking change to fix this. It will impact the representation of the data structure in some cases (like the example you provided in this ticket).

It might take me a little time to get around to this bug - is this a blocker for you?

stephen-peck commented 3 years ago

@paustint thanks, currently I think it's more limiting than blocking - it reduces the complexity of queries that can be parsed, but at least failure is obvious. It would help us to have any fix soon, esp if it's a breaking change.

paustint commented 3 years ago

@stephen-peck - what do you think about this data structure change?

  1. The left object can now possibly be null if a NOT logical operator is used and there are no preceding parenthesis
  2. The left object will be populated with at most one property: openParens preceding the NOT operator

I have this working locally on these two queries, but have not run all unit tests so I don't know if I will run into some other issues. I will also need to review the HAVING clause and potentially make some updates there as well.

Breaking changes:

  1. left can now be null
  2. the operator property on left is now optional (I would really like to come up with better types for the Condition interface to represent all the valid states that this object could be in)
  3. logicalPrefix is deprecated since NOT is represented as an operator

It is possible that I have introduced some other parsing bug, so I will need to do some investigating to make sure that invalid queries are not parsed as being valid.

SELECT Id FROM Account WHERE NOT (Name = '2' OR Name = '3')

{
    "fields": [{ "type": "Field", "field": "Id" }],
    "sObject": "Account",
    "where": {
      "left": null,
      "operator": "NOT",
      "right": {
        "left": {
          "field": "Name",
          "operator": "=",
          "value": "'2'",
          "literalType": "STRING",
          "openParen": 1
        },
        "operator": "OR",
        "right": {
          "left": {
            "field": "Name",
            "operator": "=",
            "value": "'3'",
            "literalType": "STRING",
            "closeParen": 1
          }
        }
      }
    }
  }

SELECT Id FROM Account WHERE ((NOT (Name = '2' OR Name = '3')))

  {
    "fields": [{ "type": "Field", "field": "Id" }],
    "sObject": "Account",
    "where": {
      "left": { "openParen": 2 },
      "operator": "NOT",
      "right": {
        "left": {
          "field": "Name",
          "operator": "=",
          "value": "'2'",
          "literalType": "STRING",
          "openParen": 1
        },
        "operator": "OR",
        "right": {
          "left": {
            "field": "Name",
            "operator": "=",
            "value": "'3'",
            "literalType": "STRING",
            "closeParen": 3
          }
        }
      }
    }
  }
stephen-peck commented 3 years ago

Cheers for the fix, so operators AND, OR have two operands (left, right) and the new NOT operator takes one operand (right). That seems reasonable to me - parenthesis handling means left couldn't be used as the single operand and a condition like NOT xyz naturally 'reads' with xyz as right