simonmichael / hledger

Robust, fast, intuitive plain text accounting tool with CLI, TUI and web interfaces.
https://hledger.org
GNU General Public License v3.0
2.91k stars 315 forks source link

Trim quoted values when importing csv? #1051

Closed benwhalley closed 5 years ago

benwhalley commented 5 years ago

I am importing some CSV from Amex and noticed that their csv amount field is double-quoted and includes a space in front of the value. So, for example: " -140.00".

I noticed that this causes an error when trying to invert the amount in the csv rules file. So if I add the line: amount -%amount GBP it then throws and error on import.

In full, the output is:

the CSV record is:       "14/04/2019", "Reference: XXX", " 6.99", "PAYPAL *ELIDA76 LTD 4029357733", " Process Date 14/04/2019", ""
the amount rule is:      -%amount GBP
the currency rule is:    unspecified
the default-currency is: unspecified
the parse error is:      1:2:
  |
1 | - 6.99 GBP
  |  ^
unexpected space
expecting amount

Deleting the space in the input csv file removes the error.

Is there a way to force hledger to trim incoming values?

simonmichael commented 5 years ago

It does some of that already, but I guess it would be good to do a bit more here. Thanks for the example. Would you be interested in having a look at the Hledger.Read.CsvReader code ? The strip function would do the job.

benwhalley commented 5 years ago

Hmmm. Haskell a bit of a mind bender. After fighting with types and fmap for a bit have lazily used this for the moment:

cat amex.csv | sed 's/,\" /,"/g' |  hledger import --rules-file rules.txt  -
simonmichael commented 5 years ago

Cool. It definitely is at first. If you feel like taking another whack at it, help is available in #hledger. I'll recommend stack build hledger once, then make ghcid and make ghci in side windows, then try making a trivial change in CsvReader.hs and see ghcid update, then try reloading and running in ghci (:r <ENTER>, :main -f examples/sample.csv print <ENTER>), then try to see where to put a strip call. Though, maybe you tried that and found the failure is in the cassava CSV parsing lib, which would be a bit more difficult. I suppose then our options would be 1. preprocess the text like you've done above, but in parseCsv, and/or 2. see if we can get this supported in cassava.

simonmichael commented 5 years ago

Though, thinking about it a little more.. space is significant in CSV, so cassava is right to return it to us. The problem is our field assignment combining a minus sign and a CSV value, which is a fragile way to generate a parseable number. Interpolating a CSV value happens in renderTemplate. If we called strip there, this should work ? However maybe it's not appropriate to strip every interpolated CSV field this way..

And separately, I have become confused. My local repro has been:

2000-01-01,a, " 1"
fields date, description, amount
$ hledger -f a.csv print   # using latest hledger master, built with cassava-0.5.1.0
hledger: user error (/Users/simon/src/PLAINTEXTACCOUNTING/hledger/a.csv:1:15:
  |
1 | 2000-01-01,a, " 1"
  |               ^
unexpected '"'
expecting ',', end of input, end of line, or unescaped character
)

which has no interpolation rule, and fails earlier than yours, while cassava is parsing the CSV. I don't understand how yours gets as far as it does.

simonmichael commented 5 years ago

Oh yea, I see my error. , "1" causes cassava to fail. ," 1" causes your hledger rule to fail. I must be mixing up two issues.

simonmichael commented 5 years ago

Yes, I was thinking of #1037. Back to your issue: I confirm that adding strip like so fixes it:

    replace ('%':pat) = maybe pat (\i -> strip $ atDef "" record (i-1)) mindex

Now, is it acceptably intuitive/predictable/principled that %-interpolated values would always have whitespace stripped..

simonmichael commented 5 years ago

Actually there's a separate code path when assigning an amount field (amount/amount-in/amount-out). So we could strip whitespace only in that case, and leave it as-is otherwise. Which is best ?

simonmichael commented 5 years ago

Sorry for the stream-of-thought. Rather, it would be:

"Any CSV value interpolated in a amount, amount-in or amount-out field assignment will have any outer whitespace removed. In all other field assignments, outer whitespace in CSV values is left intact."

Which sounds a bit messy.

simonmichael commented 5 years ago

I just went ahead and committed "always strip outer whitespace when interpolating". This removes the snag you hit and will hopefully be harmless otherwise. I'll mention it on list.

benwhalley commented 5 years ago

Hey - thanks. And for the tip on ghci too - that makes debugging etc much easier! I'll try and have a go if something else comes up. I've been interested in Haskell for a while.

simonmichael commented 5 years ago

For the record: RFC 4180 2.4 says "Spaces are considered part of a field and should not be ignored", so we're going slightly against the CSV spec here. Perhaps better if we would strip spaces only from amounts, but we're not aware of CSV fields' meanings when interpolating.