Closed lojikil closed 1 year ago
I did lie however, I do call parse_block
:
https://github.com/lojikil/coastML/blob/master/carpet/parse.py#L1264
And naturally it's more complex than just something with the _
base case...:
>>> import carpet
>>> l = carpet.Lex("_ { () } esac")
>>> l.next()
TokenIdent(_)
>>> l.next()
TokenBlockStart()
>>> l.next()
TokenUnit()
>>> l.next()
TokenBlockEnd()
>>> l.next()
TokenKeyword(esac)
>>> l.next()
TokenEOF()
Ah, here we go: a whitespace consumer bug? or something with unit
specifically?
>>> f = """
... | _ {
... ()
... }
... esac
... """
>>> l = carpet.Lex(f)
>>> l.next()
TokenOperator(|)
>>> l.next()
TokenIdent(_)
>>> l.next()
TokenBlockStart()
>>> l.next()
TokenUnit()
>>> l.next()
TokenUnit()
Seems fairly Unit specifically, at least at first blush:
>>> ff = """
... | _ {
... 10
... }
... esac
... """
>>> ll = carpet.Lex(ff)
>>> ll.next()
TokenOperator(|)
>>> ll.next()
TokenIdent(_)
>>> ll.next()
TokenBlockStart()
>>> ll.next()
TokenInt(10)
>>> ll.next()
TokenBlockEnd()
(this is the first time I'm doing my debugging for a language fully in public; usually I just write obscure asciidoc notes to myself and push the fix later hahah)
and the fix is trivial:
index 219f30b..2412708 100644
--- a/carpet/parse.py
+++ b/carpet/parse.py
@@ -511,7 +511,7 @@ class Lex:
return TokenOperator(self.src[o:no], self.line, self.offset)
elif self.src[o] == '(':
if self.src[o + 1] == ')':
- self.offset += 3
+ self.offset = o + 2
return TokenUnit(self.line, self.offset)
self.offset = o + 1
return TokenCallStart(self.line, self.offset)
The issue was that I was originally using the raw offset, but the raw offset is not necessarily where we actually started, since we consumed whitespace first; I should hunt for other code that uses the same +=
instead of o + N
Interesting edge case; I was testing some compiler work by writing a cons-cell ADT for use as a fallback in languages that don't have an underlying list mechanism and ran into a weird error:
carpet.parse.CoastalParseError: ("Incorrect top-level form <class 'carpet.parse.TokenCallEnd'>", 27)
breaking the code down a bit we get to the heart of what failed:
It's interesting because the minimal test case does exactly what you would expect:
I believe this is because I don't actually call
parse_block
forcase
forms, but rather a custom block reader, but still, the lexer should be the same. Will look into this shortly.