lwgray / think

MIT License
0 stars 1 forks source link

Parser Token Issue: Unable to Handle Multi-Word "else if" #3

Closed lwgray closed 1 week ago

lwgray commented 1 week ago

Problem Description

The ThinkPy parser is failing to handle else if as a compound token. The lexical analyzer is treating "else" and "if" as separate tokens, breaking the parser's ability to properly handle conditional chains.

Current Token Handling

# Current token definition
tokens = (
    'IF', 
    'ELSE',  # 'else if' being treated as two separate tokens
    # ...
)

# This creates ambiguity in parsing rules when encountering:
else if condition then {  # Parser sees: ELSE IF ...
    # statements
}

Root Cause

The lexical analyzer (lexer) is designed to handle single-word tokens. When it encounters "else if", it:

  1. First identifies "else" as an ELSE token
  2. Then identifies "if" as an IF token
  3. Creates ambiguity in parsing rules that expect a single token for else-if conditions

This leads to syntax errors because the parser cannot properly handle the sequence of ELSE + IF tokens in this context.

Verification Steps Taken

  1. Added debug logging to trace token generation:

    def t_debug(self, t: lex.LexToken):
    """Debug token stream"""
    print(f"DEBUG: Token: {t.type}, Value: {t.value}, pos={t.lexpos}")
    return t
  2. Observed token stream for else-if statement:

    DEBUG: Token: ELSE, Value: else, pos=123
    DEBUG: Token: IF, Value: if, pos=128    # Shows separate tokenization

Proposed Fix

Replace "else if" with single-token "elif" following Python's convention:

  1. Add ELIF to token definitions:
    
    tokens = (
    # ...
    'ELIF',  # Single token for else-if
    # ...
    )

Add to reserved words

reserved = {

...

'elif': 'ELIF'

}


2. Update parsing rules to use ELIF token:
```python
def p_else_if_condition(self, p):
    """
    else_if_condition : ELIF expression THEN LBRACE statement_list RBRACE
    """
    p[0] = {
        'type': 'elif',
        'condition': p[3],
        'body': p[5]
    }

Before & After Examples

Before (Failing)

decide {
    if score >= 90 then {
        grade = "A"
    } else if score >= 80 then {  # Fails: treated as separate tokens
        grade = "B"
    }
}

After (Working)

decide {
    if score >= 90 then {
        grade = "A"
    } elif score >= 80 then {  # Works: single token
        grade = "B"
    }
}

Why Enhanced Error Handling Was Also Needed

The original error messages didn't reveal that the issue was related to token parsing:

ThinkPy Parser Error: Syntax error

Added debugging and error context to help identify similar issues in the future:

ThinkPy Error: Syntax error at token ELSE
Line: 8
Column: 18
Context: Near token: 'else if'
Source code:
   7:                 grade = "A"
-> 8:             else if score >= 80 then {
   9:                 grade = "B"

Acceptance Criteria

  1. Parser successfully recognizes "elif" as a single token
  2. Compound conditions parse correctly with "elif"
  3. Error messages clearly indicate token-related issues when they occur
  4. Documentation updated to specify "elif" usage

Migration Impact

Labels

Complexity Level Justification

This issue is rated as Knight level complexity because it requires:

While not requiring complete system architecture knowledge (Master level), it does need significant expertise in:

Related