Open jfly opened 2 years ago
Thanks for the report, and sorry for just noticing it got overlooked and for not spending time on it yet. Any help welcome.
Indeed this seems unclear. I thought we did support semicolons in descriptions, as they are quite common in the wild in my experience.
We do support semicolons in account names, at least according to https://hledger.org/1.26/hledger.html#account-comments. (I'm not really sure why, except Ledger and we probably always did and I didn't want to make a breaking change.)
I'd be happy to help, but honestly not clear on what we want to do.
Would you be open to some validations somewhere in hledger that prevent semicolons in descriptions?
I would probably first clarify the status quo:
Turns out I'm pretty busy right now, so I don't think I'll have time to look into this deeper anytime soon, sorry. Next time I have some available open source time, I will look into this if someone else hasn't already.
I wanted to leave one quick comment though.
- clarify whether this a csv only or more general issue
I did mention above that hledger add
also has a similar problem: if you try to add a description with a semicolon in it, it gets split up into a description and a comment. So this doesn't feel csv specific to me. Does that answer your question?
First off: thanks for a wonderful tool with tons of documentation! I'm just getting my feet wet with PTA, and this tool and its community resources have been great.
I'm using hldeger to import CSV statements from my bank. Some of the descriptions in that csv file contain semicolons. For example:
Note that in this example, the description is a string containing a semicolon:
a description; with a semicolon
, and there's a comment that has no special characters:this is a comment
.If I try to parse this with hledger, I get this:
It's not obvious to the human eye unless you're very familiar with looking at these journal files, but this journal file represents something different. The description lost its semicolon and everything after the semicolon, and the comment got prefixed with part of the comment. This is easier to see if you parse this with hledger and output it as JSON:
I'm not sure what that newline at the end of
tcomment
is about, but hopefully it's clear how the description and comment got mangled.a description; with a semicolon
->a description
this is a comment
->with a semicolon ; this is a comment
(and maybe it got a newline added as well? I'm guessing that's an unrelated JSON specific quirk)What I'm unclear on is if it should even be possible to preserve the original description and comment. I've read hledger's description of the journal format and ledger's description as well. I don't see any mechanism for quoting the description or escaping semicolons, so I think the grammar just doesn't allow for it, but I haven't tried to read any source code to confirm.
If I'm correct that some characters (such a
;
are not allowed), then I think thathledger
is missing some useful validations somewhere: I'd much rather not be allowed to import the original CSV and be forced to deal with the semicolons myself (my plan right now is to preprocess my bank's CSVs to remove or replace any semicolons in the descriptions).For the record, I checked, and
hledger add
seems to suffer from a similar bug/lack of validation: it lets you type in a description with a semicolon, but the resulting journal treats that as a comment: