redstreet / reds-ramblings-comments

0 stars 0 forks source link

personal-finance/automatically-categorizing-postings/ #15

Open utterances-bot opened 2 years ago

utterances-bot commented 2 years ago

Automatically Categorizing Postings — Red's Rants

                          Investment accounts have a deterministic, finite set of transaction types, and are easy to build correct and complete [[Transac...

https://reds-rants.netlify.app/personal-finance/automatically-categorizing-postings/

richban commented 2 years ago

I couldn't not find out how much data the smart_importer needs to be able to predict the postings. I have only 3 months of data but that was just not enough. What's your estimate?

redstreet commented 2 years ago

It uses machine learning. So on theory, even a single transaction of training data would suffice, but accuracy would be poor.

For credit cards and bank accounts, a single month of training data is usually adequate, because many transactions are cyclic, repeating monthly.

What specific trouble are you seeing? Mis-classified transactions? Does your training data have similar transactions to the misclassified ones?

redstreet commented 2 years ago

@richban writes:

Works like a charm! :+1:

However, I am kinda confused about these lines:

grep price ${INGEST_OUTPUT}/noaccount.bc >> ${BEAN_ROOT}/prices/prices.bc || echo ''
echo 'price' >> ${INGEST_OUTPUT}/noaccount.bc
echo '' >> ${INGEST_OUTPUT}/noaccount.bc
sed -i '/price/d' ${INGEST_OUTPUT}/noaccount.bc
sed -i '/^$/d' ${INGEST_OUTPUT}/noaccount.bc
[[ ! -s ${INGEST_OUTPUT}/noaccount.bc ]] && rm -fv ${INGEST_OUTPUT}/noaccount.bc

What are you trying to achieve here? What's the purpose of the prices.db ?

redstreet commented 2 years ago

The above is to simply gather all price entries from imports into a single prices.bc file. I like all my prices in a single file. Completely optional.

redstreet commented 2 years ago

@richban writes:

Based on https://reds-rants.netlify.app/personal-finance/automatically-categorizing-postings/

should this line: https://gist.github.com/redstreet/6f1addb87c667826fb79b509d5d88a51#file-process_all_files-zsh-L35

correspond to:

BEAN_SRC=$(bean-identify my.import $file | grep "^Account:" | sed 's/Account: *//' | sed 's#:#.#g')
BEAN_SRC="${INGEST_ROOT}/../source/${BEAN_SRC}.bc"
bean-extract my.import -f $BEAN_SRC $file
redstreet commented 2 years ago

@richban you're right, looks like those lines got left out. Let me update the gist. Thanks!

richban commented 2 years ago

The above is to simply gather all price entries from imports into a single prices.bc file. I like all my prices in a single file. Completely optional.

Okay now it makes sense: https://beancount.github.io/docs/beancount_language_syntax.html#prices

richban commented 2 years ago

It uses machine learning. So on theory, even a single transaction of training data would suffice, but accuracy would be poor.

For credit cards and bank accounts, a single month of training data is usually adequate, because many transactions are cyclic, repeating monthly.

What specific trouble are you seeing? Mis-classified transactions? Does your training data have similar transactions to the misclassified ones?

Exactly: mis-classified transactions (it only classifies for one type of account). But perhaps it's because the Description in my beanfiles have been manually inputed and there is a discrepancy between source file the csv (statement from the bank) and the bean files. At least if the ML classifiers uses the Description for training.

redstreet commented 2 years ago

That's a good point and likely the reason you're seeing poor classification. I personally leave the original transaction memos untouched. This serves the additional purpose of less import work.

I forget if smart importer includes the payee field as training data. You might be able to use that without disrupting the training.

You could also try appending to the existing description string as opposed to changing it.

richban commented 2 years ago

The thing is I started to use beancount back in Sep 21 and my process was manually (terrible choice, took me around 2 hours to input one week of data) type every postings. Thus that's why the memos/description is different then in the memos when I export them.

redstreet commented 2 years ago

Got it. You could reimport and overwrite those old transactions with your new importer if you don't want to wait another month to start using smart-importer 🙂.

richban commented 2 years ago

Yes I think I will just reimport everything from the day I have opened my bank accounts since now I have the importers implemented.

awtimmering commented 2 months ago

Applying smart_importer output switches the order, i.e. it "inserts" the counter account before the asset account, which then leads to output being written to eg "Expenes.*.bc" files. Ever seen this behavior?

scanta2 commented 2 months ago

@awtimmering I see the same behavior

scanta2 commented 2 months ago

And the problem happens with and without 'filing_account'.

redstreet commented 2 months ago

Interesting, and no I haven't seen this. I wonder if something in the latest version of either Beancount or smart importer is the cause. What versions are you both using? Does this happen consistently and always?

Regardless, reds-importers should ensure this doesn't happen. There should be already code somewhere to mark the account which decides the output file, that we should activate to work for all cases. Let me look at it perhaps later today or tomorrow.

Would either of you mind filing a bug?

Edit, bug: https://github.com/redstreet/beancount_reds_importers/issues/97