Ignore unknown syntax in postings

alensiljak commented 5 years ago

As an intermediate step towards #1015, and various parts thereof, would it be possible to ignore the syntax for features not currently supported, yet use the parts which are supported in trades? I'm thinking out loud and wondering what effects this may have.

My end goal here is to be able to read a data file https://gitlab.com/snippets/1856416 without errors. Hledger would be able to parse this and ignore the lot syntax, yet correctly assign values to an appropriate account. This would enable me to use hledger with my real-world data file, instead of just on test scenarios. A warning could potentially be issued if such syntax is encountered (once per execution, max.) as an alert.

From what I've read so far, hledger should be able to correctly process the given data file as there is nothing strange about the transactions. Except, of course, that the whole lots functionality is missing, which is ok. Simply operating on numbers and commodities would also be fine for most reports.

adept commented 5 years ago

What's the behavior you propose for "hledger print"? (taking into account all its possible options)

alensiljak commented 5 years ago

@adept, good question. That would depend on what hledger does currently and I'm not familiar with the source code, admittedly.

If it recreates transactions output from internal objects, then there would be a loss of unknown syntax elements. Loss, but would have to be accepted and noted. However, not a preferable option, since using print is great for reordering transactions for ledger. This would make it practically useless. On the other hand, ledger could be used for this purpose until the full support is implemented. I take it that this would be a temporary situation.
If it outputs the original transactions, then the original syntax is preserved. This is the ideal solution. I would like to keep any original syntax. We should distinguish between a supported but invalid syntax, and unsupported (but recognized?) syntax. The first would raise a warning/error and the second, as mentioned, could have one warning about this.

The print command should output a compatible file. Technically, even if it contains an unrecognized syntax, the file is still compatible. It would be good to preserve anything that the user has entered in general. Especially having in mind that these specific features would eventually be supported by hledger.

My general idea is that "data is data". Some see it as a list of instructions, more like a source code, but I prefer to look at my journal file as a pure data. Hope this answers your question somewhat.

simonmichael commented 5 years ago

@MisterY, generally agreed.

I think being too lax can cause problems, eg unpredictability or vague parse errors, so eg we don't ignore any unrecognised line like Ledger does, even though that sounds quite handy.

But for common Ledger syntax that we don't yet handle, our usual policy is we should accept it and ignore it. There are still quite a few cases where we don't implement that. This would be a good relatively easy starter task, if you're interested.

print isn't required to preserve everything from the input, just to print a valid journal. Currently it strips out directives and inter-transaction comments, eg. We'd like to make it optionally more data-preserving, but that's a separate issue.

simonmichael commented 5 years ago

See also: #258, #299.

simonmichael commented 5 years ago

PS: related docs start at https://hledger.org/dev/hledger.html#costs , please let me know of experiences/issues.

alensiljak commented 5 years ago

Basically, the roadblock is the following: Given a file with trades, running hledger b will result in

   |
16 |     Assets:Shares    -5 ETF {10 EUR} [2019-01-10] @ 15 EUR
   |                              ^
unexpected '1'
expecting '='

or, if I remove the cost basis info of {10 EUR},

   |
16 |     Assets:Shares    -5 ETF [2019-01-10] @ 15 EUR
   |                             ^
unexpected '['
expecting ';', '=', '@', '{', end of input, or newline

It would be great if the unknown syntax {10 EUR} and [2019-01-10] would simply be ignored, if it can't be processed, and the remaining -5 ETF @ 15 EUR was handled as if the other parts were not there. This would allow the compatibility with ledger files, although not with features, and would allow using them directly with hledger, requiring no modifications.

Going a step further, just "understanding" the {10 EUR} and [2019-01-10] syntax would be the first step in adding the investment features that will require this syntax. Let me know if I missed something. Cheers!

simonmichael commented 5 years ago

@MisterY if you get around to it before I do, amountp and below is where the parser needs tweaking.

alensiljak commented 5 years ago

Thanks! I've just managed to set up the Windows Subsystem for Linux on my day-machine and to compile hledger in it! Looking to try out some Haskell hacking in the coming days.

lestephane commented 4 years ago

I also would like the following two cases to work

Error output:
hledger: -:2649:58:
     |
2649 |     Personal:Assets:Savings:Revolut:XBT  0,00008341 XBT {} 
     |                                                          ^
unexpected '}'
expecting '='

hledger: -:2649:58:
     |
2649 |     Personal:Assets:Savings:Revolut:XBT  0,00008341 XBT {{0,3 EUR}} 
     |                                                          ^
unexpected '{'
expecting '='

So that I may offload the generation of capital gains transactions to beancount and re-include those transactions in hledger through hledger-flow:

hledger-flow
  -> construct 1-in/yyyy/csv
    -> (in construct) hledger > tmp.x.journal
    -> (in construct) bean-report tmp.x.journal hledger > 3-journal/yyyy/x.journal

This enables me to use inventories without waiting for hledger to have them (#488 #1029 #624 #1022 #1015), or at least, that is the idea.

For those who don't know, in beancount {} can be used on the credit leg of a buy transaction to indicate that the lot is to be tracked as an inventory item (the cost is derived from the other leg in the same transaction).

And using the {} on the debit leg of a sell transaction indicates to dispose of as many inventory items /lots as necessary according to the booking method (FIFO in my case) to have the transaction balance out, adding a gain and rounding transaction leg as necessary. Magic.

This is the only sane I've found to track gains without having to type every single lot price myself. I have lots of small transactions from using the Revolut spare change saving feature targeting vaults of different currencies.

Whatever happens, I'm already doing the appending of '{}' in construct using awk, so my wish is not high priority.

simonmichael commented 4 years ago

Sounds good! Would anyone like to work on tweaking our parser to ignore these ?

alensiljak commented 1 year ago

Sounds good! Would anyone like to work on tweaking our parser to ignore these ?

I'm very interested but need to learn some Haskell first.

simonmichael / hledger

Ignore unknown syntax in postings #1021