madhat2r / plaid2text

Python Scripts to export Plaid transactions and transform them into Ledger or Beancount syntax formatted files.
GNU General Public License v3.0
89 stars 19 forks source link

+TITLE: Plaid2Text Documentation

+HTML_HEAD_EXTRA:

To download transactions, we use [[http://www.plaid.com][Plaid]]. This program will help you setup your accounts and download transactions from the Plaid API. I have tried to make this as simple as possible to setup.

Also, once downloaded, your transactions will be stored in a Mongo database. It is actually the transactions pulled from the database that we run though our syntax renderers. This is required to help keep track of which transaction we have already processed (as well as have records to reconstruct our files should the need arise).

The main inspiration for the workings of the export part of the script came from the excellent [[https://github.com/quentinsf/icsv2ledger][icsv2ledger]]. I borrowed heavily from Quentin's excellent script in making this tool, and in some places I have shamelessly stolen his code altogether.

I have only tested this on Linux. I have no desire to run this on Windows, but feel free to give it a shot, it may work. The same goes for Mac, although sometime in the future I may test on OSX.

Note: python dependencies must be installed prior to running the script. All can be installed from =pip= or of course whatever means you wish to use.

First, clone the git repo: =git clone git@github.com:madhat2r/plaid2text.git=

or if preferring HTTPS: =git clone https://github.com/madhat2r/plaid2text.git=

Then change into folder: =cd plaid2text=

Then use =pip= to install: =pip install .=

If you plan on making modifications to the program, then you may want to install it as editable. (this is my preferred way) =pip install -e .=

The keys we are interested in are:

Once you have obtained your keys then use =plaid2text= to create your configuration file and save your keys into it. You can do this by simply invoking plaid2text without arguments. =plaid2text= will prompt you for your keys and store them in your config file.

A note about Plaid. Plaid is a paid service, but developers have access to the developer API without paying. The developer API has all the features of the production API. I have been using this for a few months now on my 6 accounts and everything is still working just fine. I did contact them about what the cost would be (and told them my use case), and was informed that a paid version comes to 0.25 USD per account, per month. That is still a heck-of-a-better deal than Quickbooks online in my opinion. I can get my 6 accounts for 1.50 USD per month, but like I mentioned, I have yet run into any caps on my developer account, so that may be all I ever use.

In order to use =plaid2text= you must have already followed the instructions in the Plaid section. Once you have your initial config in place, then let's get started in creating your first account.

** Creating a Plaid Account In order to get transactions from Plaid, you must create an account. In order to create an account, you must authenticate yourself to your institution via your username and password, and also most institutions require some form of multi factor authentication, usually in the form of security questions, or codes sent to registered phone/email for the account.

The =--create-account= flag will create a new account using the plaid-account argument as the new nickname. This semi-automate the process of creating and authenticating an account with your instituion for your Plaid account. Once you run =plaid2text accountName --create-account= you will be promted to open an html file located in your configuration folder (default is =\~/.config/plaid2text/auth.html=). Click on the button labeled =Open Link - Institution Select= and you will be prompted to authenticate with the instituion. Once you do, you will see your public_token displayed on the page. Enter that back into =plaid2text=.

=plaid2text= will then display a list of accounts associated with that institution and their corresponding account_id's. Paste the desired account_id and your account is ready to be used with =plaid2text=.

Note: wait at least 15 minutes before the first download of your transactions, this give Plaid time to collect the information from your institution. Plaid says it will have them within 240 seconds, but I think it's better to give it time.

Also, different institutions keep your historical data for different lengths of time.

+BEGIN_SRC

plaid_account: (mandantory) this is the nickname you assigned when creating account outfile: output filename or stdout in your chosen snytax (ledger,beancount)

+END_SRC

Note: the outfile will be overwritten each time this is run so be careful that you do not erase your current journal file, or any other file of importance.

Also, note that when there are both config setting and command line options, the command line options take precedence over config file settings.

+BEGIN_SRC

--accounts-file FILE file which holds a list of account names (LEDGER ONLY) (default : ~/.config/plaid2text/accounts) --all-transactions pull all transactions even those who have been previously marked as processed (default: False) --clear-screen, -C clear screen for every transaction (default: False) --cleared-character {,!} character to clear a transaction (default: ) --create-account Create a new Plaid account using the plaid-account argument as the new nickname (Example: chase_savings) --currency STR the currency of amounts (default: USD ) --default-expense STR expense account used as default destination (default: Expenses:Unknown) --download-transactions, -d download transactions into Mongo for given plaid account --from-date STR specify a the starting date for transactions to be pulled; use in conjunction with --to-date to specify rangeDate format: YYYY-MM-DD --headers-file FILE file which contains contents to be written to the top of the output file (default: ~/.config/plaid2text/headers) --journal-file FILE, -j FILE journal file where to read payees/accounts Tip: you can use includes to pull in your other journal files (default journal file: ~/.config/plaid2text/journal) --mapping-file FILE file which holds the mappings (default: ~/.config/plaid2text/mapping) --dbtype {mongodb,sqlite} The database type to use for storing transactions. --mongo-db STR The name of the Mongo database (default: plaid2text) --mongo-db-uri STR The URI for your MongoDB in the MongoDB URI format (default: mongodb://localhost:27017) --sqlite-db FILE The path to the SQLite DB to use, if --dbtype is sqlite --no-mark-processed, -n Do not mark pulled transactions. When given, the pulled transactions will still be listed as new transactions upon the next run. (default: False) --output-date-format STR date format for output file (default: YYYY/MM/DD) --output-format {beancount,ledger}, -o {beancount,ledger} what format to use for the output file. (default format: beancount) --posting-account STR, -a STR posting account used as source (default: Assets:Bank:Checking) --quiet, -q do not prompt if account can be deduced from mappings (default: False) --tags, -t prompt for transaction tags (default: False) --template-file FILE file which holds the template (default: ~/.config/plaid2text/template) --to-date STR specify the ending date for transactions to be pulled; use in conjunction with --from-date to specify rangeDate format: YYYY-MM-DD -h, --help show this help message and exit

+END_SRC

** Options

~--accounts-file~ is a file that you can store predefined account definitions for Ledger in the form of =account Expenses:Unknown=. This file is parsed for the account names and all lines that do not start with account will be ignored.

This is LEDGER specific setting.

~--all-transactions~ will pull all transactions regardless if they are marked as already pulled. By default only transactions that have not been pulled to text are returned.

~--clear-screen, -C~ clears the screen before every transaction prompt. Default is ~False~.

~--cleared-character {,!}~ is the character mark a transactions as cleared or not. Default is ==

~--create-account~ is used to create a new account. See creating account section above for more.

~--currency STR~ is the currency used for transactions. Default is =USD=.

~--default-expense STR~ is the default account for which to post transactions to. Default =Expenses:Unknown=

~--download-transactions, -d~ fetches new transactions from Plaid into Mongo for given account.

Use: =plaid2text acct_nickname -d=

~--from-date STR~ specify a the starting date for transactions to be pulled.

Use in conjunction with ~--to-date~ to specify range

Date format: =YYYY-MM-DD= or =YYYY/MM/DD=

~--headers-file FILE~ file which contains contents to be written to the top of the output file. For example, I store my beancount files as OrgMode files, so I have my headers file setup to insert instructions at the top for =Emacs=, to help ease my editing of them once they are exported to text. And also I include my main beancount file which has all my accounts listed, this also allows for easy running of =bean-check= to verify the newly exported file.

+BEGIN_SRC

;; -- mode: org; mode: beancount; --
include "/path/to/somewhere/main.beancount"

+END_SRC

Default: =~/.config/plaid2text/headers=

~--journal-file FILE, -j FILE~ journal file where to read payees/accounts. This could be your main ledger file or your main beancount file.

Tip: you can use includes to pull in your other journal files

Default journal file: =~/.config/plaid2text/journal=

~--mapping-file FILE~ file which holds the mappings for matching transactions to accounts/payees as well as some default tags, if you want.

You can have a separate mappings file per account.

default: =~/.config/plaid2text/mapping=

~--mongo-db STR~ name of the Mongo database that stores downloaded transactions.

Default: ~plaid2text~

~--mongo-db-uri STR~ The URI for your MongoDB in the MongoDB URI format

Default: ~mongodb://localhost:27017~

~--no-mark-processed, -n~ will not mark pulled transactions as pulled. When passed, the pulled transactions will still be listed as new transactions upon the next run.

Default: ~False~

~--output-date-format STR~ date format for output file

Default: ~YYYY/MM/DD~

~--output-format {beancount,ledger}, -o {beancount,ledger}~ what syntax to use for the output file.

Default output format: beancount

~--posting-account STR, -a STR~ posting account used as source

Default: ~Assets:Bank:Checking~

~--quiet, -q~ do not prompt if account can be deduced from mappings

Default: ~False~

~--tags, -t~ causes the program to prompt for transaction tags

Default: ~False~

~--template-file FILE~
file which holds the text template used in the output file for formatting transactions.

Default: =~/.config/plaid2text/template=

~--to-date STR~ specify the ending date for transactions to be pulled.

use in conjunction with ~--from-date~ to specify range

Date format: ~YYYY-MM-DD~ or ~YYYY/MM/DD~

The following will set up a new account with nickname =chase_checking=

~plaid2text chase_checking --create-account~

The following will download all new transactions for the account =chase_checking=.

*NOTE*: when downloading for the first time, be sure to wait at least 15min
after setting up the account. This gives Plaid time to pull your
transactions from the institution.

~plaid2text chase_checking --downlad-transactions~

The following will pull all new transactions for account =chase_checking= and output them to =/tmp/onetime.ldg= Ledger syntax after prompting you for the correct information for every transaction and marking all pulled transaction in the database as pulled.

~plaid2text chase_checking /tmp/onetime.ldg --output-format ledger~

The following will pull all transactions starting from the given date for the =chase_checking= account and will not mark them as pulled in the database, and will output beancount syntax to stdout.

~plaid2text chase_checking --all-transactions --from-date 2015/04/15 --no-mark-processed~

During your first run, when your mappings file has not yet been established, you will have to manually (via prompts) establish the correct accounts and payees. But once you have done so, your mappings file will have the correct information for transactions in the future, and given that most of us are creatures of habit and make purchases from the same places, you will only occasionally have to account for new entries.

Now let's walk through a transaction for you can get an idea of what to expect. Keeping with out sample account =chase_checking=, we will pull the latest transactions, and also prompt for tags (=--tags=) and suppress prompting for known transactions via our mappings (=--quiet=), we will also be using =beancount= output syntax (=--output-format=).

~plaid2text chase_checking /tmp/onetime.bnc --quiet --tags --output-format beancount~

When the above command is run, you will be presented with a prompt for the first non-matched transaction. The first prompt is for the payee. You will notice that the default answers are in =[]=, so if you just hit enter, that will be the value. When looking at the transaction prompt, you will see it starts with a date followed by the name that Plaid assigns this transaction, in this case Plaid got it correct, this will not always be the case. The next area shows the amount of the transaction.

[[file:img/netflix_payee.png]]

Following the payee prompt is the "Account" prompt. Enter the correct associated account, then hit enter.

[[file:img/netflix_account.png]]

Then we are prompted for tags (because we passed ~--tags~). Tags work a little differently, you will be prompted over and over for tags until you just hit enter without typing another value. If you make a mistake in entering your tag, you may prefix the tag with =-= (minus) to remove it. For instance say you accidentally typed =mvoie= and hit enter, when the prompt comes back you see your mistake in the default area and want to correct it. So now you type =-mvoie= and hit enter, and you will notice that the tag has been removed.

[[file:img/netflix_tags.png]]

When you hit enter the final time on tags, the program will move on to the next transaction needing your input.

Again, all of these prompts use =TAB= completion, and the more information you give the program, via your config files, the better the completion becomes.

+BEGIN_SRC

[DEFAULT]
posting_account = Assets:Bank:Checking
default_expense = Expenses:Unknown
encoding = utf-8
currency = USD
mongo_db = plaid2text
mongo_db_uri: mongodb://localhost:27017
quiet = False
tags = False
output_date_format = %%Y/%%m/%%d
clear_screen = False
cleared_character = *
output_format = beancount

[PLAID]
client_id = xxxxxxxa66710877xxxxxxxx
secret = xxxxxxxxx8c9a0cd27xxxxxxxxxxxx

[chase_checking]
access_token = access-development-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
account = xxxxxxxxxxxxxPzJ3nAkFxxxxxxxxxxxxxxxx
item_id = xxxxxxxxxxxxxxxcc4f53xxxxxxxxxxxxxxxxx
currency = USD
posting_account = Assets:Bank:Chase:Checking
mapping_file = ~/.config/plaid2text/chase_checking/mapping_bc
headers_file = ~/.config/plaid2text/chase_checking/headers_bc
accounts_file = ~/somewhere/main.beancount
journal_file = ~/somewhere/beancount/main.beancount
template_file = ~/.config/plaid2text/chase_checking/template_bc

+END_SRC

** Template File The template file is what transforms your transactions into the desired text based accounting syntax. You have access to all the fields that plaid returns to use in your templates. But be aware that not all fields are returned with every transaction, and you might have to modify the source to handle this, should you choose to use them in your template. Below is a list of all fields available. The =A= column indicates if field is always available.

| Field | Types | A | |-------------------------------+---------+---| | _account | String | y | | _id | String | y | | amount | Number | y | | name | String | y | | date | Date | y | | meta | Object | y | | meta.location | Object | y | | pending | Boolean | y | | score | Object | y | | score.location | Object | y | | score.name | Number | y | | type | Object | y | | type.primary | String | y | | meta.location.state | String | n | | score.location.state | Number | n | | category | Array | n | | category_id | String | n | | meta.location.city | String | n | | score.location.city | Number | n | | meta.location.coordinates | Object | n | | meta.location.coordinates.lat | Number | n | | meta.location.coordinates.lon | Number | n | | score.location.address | Number | n | | score.location.zip | Number | n | | meta.location.address | String | n | | meta.location.zip | String | n | | meta.location.store_number | String | n | | meta.payment_processor | String | n | | meta.ppd_id | String | n | | _pendingTransaction | String | n | | meta.reference_number | String | n | | meta.payee | String | n | | meta.payment_method | String | n | |-------------------------------+---------+---|

In addition to the above fields =plaid2text= also provides the following:

| Field | type | |---------------------+--------| | posting_account | String | | associated_accounts | String | | payee | String | | tags | String | |---------------------+--------|

*** Brief Field Description (only of main use fields)

NOTE: The ~tags~ field is prefixed with a space, when tags are present, this allows us to loose the trailing space that would otherwise exist in situations where there were no tags, and the configured template supports them.

Example of trailing space template:

+BEGIN_SRC

{transaction_date} {cleared_character} "{payee}" "" {tags}

+END_SRC

Using the above template would result in a trailing space when no tags are present.

Example proper template:

+BEGIN_SRC

{transaction_date} {cleared_character} "{payee}" ""{tags}

+END_SRC

This template will add prefix the ~tags~ with a space only if they are present, otherwise it returns an empty string. This line will not have a trailing space.

* Defaults ** beancount

+BEGIN_SRC

{transaction_date} {cleared_character} "{payee}" ""{tags} plaid_name: "{name}" plaid_id: "{_id}" {associated_account:<60} {amount} {currency} {posting_account}

+END_SRC

**** ledger

+BEGIN_SRC

{transaction_date} {cleared_character} {payee}{tags} ; plaid_name: {name} ; _id: {_id} {associated_account:<60} {currency} {amount} {posting_account:<60}

+END_SRC

** Headers File

The headers file is used to add some text to the top of the output file. This can be anything you like. I use mine for adding some header info for =Emacs= to read for it sets the correct mode for me when I edit the file.

I also use the ~include~ directive to pull in my main file, to aide in running =bean-check=.

** Mappings File The mappings file is simply a =CSV= formatted file, that contains four fields. When exporting transactions, this file will try to establish the proper accounts and payees for each transaction based on the fields in the file. It also handles adding some default tags.

This file is created for you, if you do not have one defined in the settings. Also, it is appended to every time you are exporting transactions with the new matches, that way next time you export you will not have to enter the information again if you use =--quiet=.

*** Fields

  1. text to match against the Plaid =name= field. This can be either plain text or a regex. If the field starts and ends with =/= it is assumed to be a regex. Note: all the regexes will be matched /case insensitive/.
  2. the name you wish to use for the =payee=
  3. the associated expense or other account (i.e. ~Expenses:Unknown~)
  4. tags to be used for this transaction. This should be in the form of a string. For ledger the format would be: ~:tag1:tag2:etc:~ and for beancount: ~#tag1 #tag2 #etc~

*** Important Point The matching algorithm will always use the latest match when processing entries. So if for example you have a regex setup that matches //best buy// at the top of the mappings file and another that has //buy// later in the file, the last match wins.

*** Sample Mappings File Some of the listings will contain ledger formatted tags while other will be beancount, you of course will only have the type that you need, do not mix them.

+BEGIN_SRC

/Amazon/,"Amazon",Expenses:Unknown:Amazon, #sort-out /PAYPAL INST XFER/,"PayPal",Expenses:Unknown:PayPal, :sort-out: /.NETFLIX./,"Netflix",Expenses:Bills:Subscriptions:Netflix /.DROPBOX./,"Dropbox",Expenses:Bills:Subscriptions:Dropbox /Amazon Video/,"Amazon Video",Expenses:Entertainment:Movies The Doughnut Palace,"The Doughnut Palace",Expenses:Food:FastFood 54th Street,"54th Street",Expenses:Food:Restaurant BJ'S RESTAURANTS,"BJ's Restaurant",Expenses:Food:Restaurant

+END_SRC

Also notice the sorting of the entries so that =Amazon Video= gets categorized properly. If it were above the =Amazon= entry, it would use the setting from there instead, as the last entry always wins.

** Download New Transactions When I get ready to work on my books, I start by downloading the newest transactions for the account I am working on.

~plaid2text chase_checking -d~

This will download all the newest transactions from my accounts into the MongoDB.

  You can of course setup a cron job to do this nightly, but I find it fits
  into my workflow just doing it manually.

** Export New Transactions I export new transaction (all the ones that haven't previously been pulled), into a temporary file, where I can do some manual checking and editing.

~plaid2text chase_checking /tmp/onetime.beancount --quiet~

Using the =--quiet= switch, the program will only prompt me for information on the transactions that it cannot deduce based on the mappings file. You can of course leave that switch off if you want to be able to change the defaults from the mapping file.

Also, if you want to do a test run, without marking the transactions as pulled use the =--no-mark-pulled= switch.

IMPORTANT I want to stress that the outfile is OVERWRITTEN or created every time this command is run. So be careful. :)

** Copy Transactions When I am satisfied that all is well with my temp file. I copy the new entries into my actual journal file.

** License This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You can obtain a copy of the license here: [[http://www.gnu.org/licenses/][GNU General Public License]]