Open utterances-bot opened 2 years ago
I couldn't not find out how much data the smart_importer needs to be able to predict the postings. I have only 3 months of data but that was just not enough. What's your estimate?
It uses machine learning. So on theory, even a single transaction of training data would suffice, but accuracy would be poor.
For credit cards and bank accounts, a single month of training data is usually adequate, because many transactions are cyclic, repeating monthly.
What specific trouble are you seeing? Mis-classified transactions? Does your training data have similar transactions to the misclassified ones?
@richban writes:
Works like a charm! :+1:
However, I am kinda confused about these lines:
grep price ${INGEST_OUTPUT}/noaccount.bc >> ${BEAN_ROOT}/prices/prices.bc || echo ''
echo 'price' >> ${INGEST_OUTPUT}/noaccount.bc
echo '' >> ${INGEST_OUTPUT}/noaccount.bc
sed -i '/price/d' ${INGEST_OUTPUT}/noaccount.bc
sed -i '/^$/d' ${INGEST_OUTPUT}/noaccount.bc
[[ ! -s ${INGEST_OUTPUT}/noaccount.bc ]] && rm -fv ${INGEST_OUTPUT}/noaccount.bc
What are you trying to achieve here? What's the purpose of the prices.db
?
The above is to simply gather all price entries from imports into a single prices.bc
file. I like all my prices in a single file. Completely optional.
@richban writes:
Based on https://reds-rants.netlify.app/personal-finance/automatically-categorizing-postings/
should this line: https://gist.github.com/redstreet/6f1addb87c667826fb79b509d5d88a51#file-process_all_files-zsh-L35
correspond to:
BEAN_SRC=$(bean-identify my.import $file | grep "^Account:" | sed 's/Account: *//' | sed 's#:#.#g')
BEAN_SRC="${INGEST_ROOT}/../source/${BEAN_SRC}.bc"
bean-extract my.import -f $BEAN_SRC $file
@richban you're right, looks like those lines got left out. Let me update the gist. Thanks!
The above is to simply gather all price entries from imports into a single
prices.bc
file. I like all my prices in a single file. Completely optional.
Okay now it makes sense: https://beancount.github.io/docs/beancount_language_syntax.html#prices
It uses machine learning. So on theory, even a single transaction of training data would suffice, but accuracy would be poor.
For credit cards and bank accounts, a single month of training data is usually adequate, because many transactions are cyclic, repeating monthly.
What specific trouble are you seeing? Mis-classified transactions? Does your training data have similar transactions to the misclassified ones?
Exactly: mis-classified transactions (it only classifies for one type of account). But perhaps it's because the Description in my beanfiles have been manually inputed and there is a discrepancy between source file the csv (statement from the bank) and the bean files. At least if the ML classifiers uses the Description for training.
That's a good point and likely the reason you're seeing poor classification. I personally leave the original transaction memos untouched. This serves the additional purpose of less import work.
I forget if smart importer includes the payee field as training data. You might be able to use that without disrupting the training.
You could also try appending to the existing description string as opposed to changing it.
The thing is I started to use beancount back in Sep 21 and my process was manually (terrible choice, took me around 2 hours to input one week of data) type every postings. Thus that's why the memos/description is different then in the memos when I export them.
Got it. You could reimport and overwrite those old transactions with your new importer if you don't want to wait another month to start using smart-importer 🙂.
Yes I think I will just reimport everything from the day I have opened my bank accounts since now I have the importers implemented.
Applying smart_importer output switches the order, i.e. it "inserts" the counter account before the asset account, which then leads to output being written to eg "Expenes.*.bc" files. Ever seen this behavior?
@awtimmering I see the same behavior
And the problem happens with and without 'filing_account'.
Interesting, and no I haven't seen this. I wonder if something in the latest version of either Beancount or smart importer is the cause. What versions are you both using? Does this happen consistently and always?
Regardless, reds-importers should ensure this doesn't happen. There should be already code somewhere to mark the account which decides the output file, that we should activate to work for all cases. Let me look at it perhaps later today or tomorrow.
Would either of you mind filing a bug?
Edit, bug: https://github.com/redstreet/beancount_reds_importers/issues/97
Automatically Categorizing Postings — Red's Rants
https://reds-rants.netlify.app/personal-finance/automatically-categorizing-postings/