Closed simonmichael closed 1 year ago
What is the reference for parsing Ledger files ? I see this file in the ledger
repo: https://github.com/ledger/ledger/blob/next/doc/grammar.y but I see that there are weirdness around the lot_date_opt
identifier, referring to date
instead of lot_date
where lot_date
follows.
I'm trying to implement a parser in Go, I'd like it to be able to read "standard" ledger files, and build a full AST, for modifying files, but also eventually interpreting it and listing register and balances.
I don't really know what is the best reference for Ledger's format. I usually start with the manual but test everything. By the way, have you seen https://github.com/howeyc/ledger ?
There's also http://plaintextaccounting.org/quickref/ which is not authoritative but has some info and some links.
Back to the topic at hand:
hledger now uses only the ledger4 parser (ledger-parse) for files whose suffix is .ledger or .l. (Just as the hledger journal parser is used for files with suffix .journal, .j or (new) .hledger). As before, files with an unrecognised suffix or no suffix are parsed by each reader in turn until one succeeds.
Update: at present the ledger4 parser is never used automatically, you must prepend a ledger:
prefix to the file path to activate it.
hledger -f ledger:t.ledger print --debug=1
Is the code here https://github.com/ledger/ledger4/blob/master/ledger-parse/Ledger/Parser/Text.hs all that is needed to parse ledger files?! Or is there an implementation somewhere else?
Yes, or more precisely, our copy of it.. but also no, because it has not been battle tested and for example it doesn't parse the amounts and prices, nor does it "cook" or apply the usual ledger-ish parsing semantics to the raw constructs. You can see some of that happening in https://github.com/simonmichael/hledger/blob/master/hledger-lib/Hledger/Read/LedgerReader.hs.
This parser never advanced past the prototype stage, and was later removed from hledger. Current strategy is to make the main journal parser more capable. Closing.
I have been intending to integrate the parser from @ledger/ledger4 as an additional reader, aiming to improve our support for modern ledger file format and facilitate h/ledger interop & more John W. hacking.
I worked on it today as part of the Ledger hackathon and have pushed basic integration to master. hledger now uses only the ledger4 parser (ledger-parse) for files whose suffix is
.ledger
or.l
. (Just as the hledger journal parser is used for files with suffix.journal
,.j
or (new).hledger
). As before, files with an unrecognised suffix or no suffix are parsed by each reader in turn until one succeeds. The integration is quite basic; as yet only transaction date/description and posting account/amount are recognised. It should be relatively easy to add to this skeleton to support more of the syntax, and help is welcome.In theory, this makes us now able to parse more ledger files than before. Actually, some ledger files are parsed better by hledger's journal parser than the ledger4 parser (eg: https://github.com/ledger/ledger4/issues/6). So I'm not sure which is the better short term strategy for supporting more ledger files: add the missing constructs to the hledger parser, as we've done in the past, or bring the ledger4 parser up to par. My feeling is that separate specialised parsers can be more useful in the long term, but that I personally should focus on the hledger parser while others develop the ledger4 parser further.