redstreet / beancount_reds_importers

Simple ingesting tools for Beancount (plain text, double entry accounting software). More importantly, a framework to allow you to easily write your own importers.
GNU General Public License v3.0
111 stars 38 forks source link

Add an importer for Vanguard 529 CSV data #83

Open farktronix opened 1 year ago

farktronix commented 1 year ago

This PR has three major parts, broken out into three separate commits:

  1. An importer for Vanguard 529 data in CSV multitable format
  2. Add an option in csvreader.convert_columns for adding .00 decimal precision to any numbers without decimals
  3. Add a unit test for the Schwab balance data importer

1. The Vanguard 529 Importer

Vanguard only provides CSV data downloads for their 529 (child education savings plan) accounts. These files (which are frustratingly named ofxdownload but contain CSV data) have two CSV tables inside of them:

Fund Account Number,Fund Name,Price,Shares,Total Value
535672845-01,Vanguard Target Enrollment 2040/2041 Portfolio,$9.45,348.5265,$3293.58

Account Number,Trade Date,Process Date,Transaction Type,Transaction Description,Investment Name,Share Price,Shares,Gross Amount,Net Amount
535672845-01,01/31/2023,01/31/2023,Contribution AIP,Contribution AIP,Vanguard Target Enrollment 2040/2041 Portfolio,$9.03,27.6855,$250,$250
535672845-01,01/03/2023,01/03/2023,Contribution AIP,Contribution AIP,Vanguard Target Enrollment 2040/2041 Portfolio,$8.42,29.6912,$250,$250

This is a pattern seen before in importers, but one difference here is that the Vanguard data doesn't provide section headers before the tables. To address this, I added a section_titles_are_headers flag that the multitable reader looks for which modifies how many rows it skips when gathering up the next table into a section.

I wanted to share the code from csvreader that processes columns to clean up currency and date formatting so I broke out a new function called process_table in csvreader to clean up the columns, which now gets called from the multitable reader as well.

2. Adding decimal precision to currency columns

Once I got my importer up and running I ran into this frustrating error with the imported data:

529.bean:5:       Transaction does not balance: (-0.000096 USD)

   2023-01-03 * "Contribution AIP" "[VTE2040] Vanguard Target Enrollment 2040/2041 Portfolio"
     Assets:Investments:Vanguard:529  29.6912 VTE2040 {8.42 USD, 2023-01-03}
     Assets:Transfers:Vanguard:529      -250 USD

After reading way more than I ever wanted to know about beancount and precision tolerances I found out that this was due to Vanguard reporting the net amount as $250, which beancount takes to imply infinite decimal precision.

I know it's possible to change this in my beancount file with inferred_tolerance_default but I think this is a situation that will confuse anyone else who uses this importer. I believe beancount made the wrong design decision here on handling numbers without a decimal (the implied precision should be 0.5 in this case) but we're past the point of being able to fix that.

My fix for handling this is to add a new option to the csv reader add_currency_precision. If this exists, then any currency field without a decimal in the number will have .00 appended to it. The default behavior of the Vanguard 529 importer is to add precision, since Vanguard doesn't provide precision on round numbers (and 529 contributions are usually a nice round number that someone contributes every month with an automatic withdrawal), but I left the rest of the importers unchanged. This can also be overridden by the user-provided config by setting 'add_currency_precision' : False.

3. Unit tests for the Schwab balance data importer

Because I had tweaked the multitable base class code I wanted to make sure I hadn't broken any other importers. It looks like the only other importer that used this code is schwab_csv_balances.py.

I don't have a Schwab account, so I manually created a csv file for this unit test based on what the importer is expecting. I don't know if this is what the actual data looks like, so if anyone has access to a Schwab account and can provide an example of one of these files I would be happy to scrub the personal data and use it for the unit test instead of my hand crafted file.

redstreet commented 1 year ago

Great, thank you for this! I'll get back on this in the next few days. Very helpful description, thank you! And I like the thoughtful defaults and configurability for (2).

Quick question: there seem to be some moves for prior Vanguard files from the /test/.* dir to its parent, was this intentional. I could be viewing this wrongly since I'm viewing this on a mobile and it's hard to tell.

farktronix commented 1 year ago

No rush on this review- take your time!

There was an existing unit test for the Vanguard importer, but it took up the whole top level tests folder, so I created two subfolders (vanguard and vanguard_529) and moved the files into those subfolders. You might be seeing files move from tests into vanguard but the full path is beancount_reds_importers/vanguard/tests/vanguard. Maybe Github is just showing the last path component on mobile.

I just double checked the files on disk and all of the test files are under tests and their corresponding vanguard and vanguard_529 subfolders.

redstreet commented 1 year ago

Looks good overall! Left a few comments. Would be happy to merge this once we work through those. Please let me know your thoughts. Thanks!