twilco / beancount

Rust tooling surrounding beancount, a text-based double-entry bookkeeping language.
59 stars 13 forks source link

Support tokens that have no whitespace between them #14

Closed twilco closed 5 years ago

twilco commented 5 years ago

In #5 we discovered that we are not properly parsing tokens with no space between them. For example, this works just fine in bean-check:

2019-02-19*"Foo""Bar"

This results in an error in our parser. Let's fix that.

I looked into using Pest's implicit whitespace to solve this problem, but since so many of our rules are whitespace-sensitive (required indentation in postings, key value lists, etc), atomicity would pervade to any rule that uses these rules. Making a rule atomic means we have to manually specify the whitespace, nullifying the benefit we get from implicit whitespace.

This is my first foray into Pest, so maybe I'm missing something here. Explore implicit whitespacing as a solution to this problem, and otherwise use our existing manual whitespacing scheme to support tokens that have no spaces between them.

mbudde commented 5 years ago

atomicity would pervade to any rule that uses these rules

I think non-atomic rules should help with that. If I'm counting correctly there's only four rules in the beancount parser that has explicit whitespace (the INDENT token): empty_line, posting, key_value_line and posting_or_kv_list. I think the last three can be combine into one. For instance, the posting_or_kv_list and posting rules could look something like (simplified):

posting_or_kv_list = @{
    (indent ~ (key_value_line | posting | tag_links))*
}
posting = !{ txn_flag? ~  account ~  incomplete_amount ~ cost_spec? ~ ... ~ eol }

That is, lines that must begin with indentation is matched by an atomic rule with a non-atomic inner rule, instead of making e.g. posting itself non-atomic.