jbms / beancount-import

Web UI for semi-automatically importing external data into beancount
GNU General Public License v2.0
391 stars 100 forks source link

[REQUEST] Better documentation on "ofx: n occurrences more than expected:" #153

Closed dppdppd closed 2 years ago

dppdppd commented 2 years ago

I've just started using beancount, beancount-import, and finance-dl. The documentation for beancount-import is very thorough, except I can't find any information on this particular issue:

I have about 5000 entries in the invalid tab, all in which there is "1 occurrences more than expected."

Here is an example:

Liabilities:CC:Chase:Amzn  -15.64 USD
  date: 2021-11-01
  ofx_fitid: "2021110124692161305100057272828"
  ofx_name: "Amazon.com*8V9FN9F83"
ofx: 1 occurrences more than expected:
transactions.bean:23347

And here is the resulting txn:

2021-11-01 * "STMTTRN - Amazon.com*8V9FN9F83"
  Liabilities:CC:Chase:Amzn  -15.64 USD
    date: 2021-11-01
    ofx_fitid: "2021110124692161305100057272828"
    ofx_name: "Amazon.com*8V9FN9F83"
    ofx_type: "STMTTRN"
  Assets:Amazon               15.64 USD

I'd appreciate more documentation about this issue, as I don't understand the system well enough to infer what the underlying problem is.

If it is documented and I missed it, I'd appreciate it pointed it out to me.

jbms commented 2 years ago

After importing those entries, did you run beancount-import again without providing it access to the OFX files that contained those entries?

beancount-import expects that when re-run, it is provided as input all of the previously imported data files as well, so that it can validate existing entries in the journal against them.

dppdppd commented 2 years ago

I ran the following test: I wiped transactions.bean I imported the first candidate It immediately shows up on the invalid tab

Zburatorul commented 2 years ago

After importing those entries, did you run beancount-import again without providing it access to the OFX files that contained those entries?

beancount-import expects that when re-run, it is provided as input all of the previously imported data files as well, so that it can validate existing entries in the journal against them.

I have had such error messages for ages, and for me they certainly appear on transactions for which I have kept the original imported data.

Zburatorul commented 2 years ago

I found one situation in which the error appears: if you have multiple accounts of the same LinkBasedSource type. For example, if you have two PayPal accounts. Then, on this line all entries in the ledger are used and one such entry contains some particular link X. But then on this line the links in the ledger will be compared to the links derived from files contained in the data folder of one source at a time. One of them is guaranteed to not contain link X.

This can be avoided by having a single source of type beancount_import.source.paypal, but then they'd share the same assets_account and import incorrectly.

dppdppd commented 2 years ago

I have 3 separate ofx dicts defined.

jbms commented 2 years ago

@Zburatorul For the LinkBasedSource issue, the solution is to use a different link prefix for each source so that they can be distinguished.

@dppdppd I attempted to reproduce the ofx issue using some of the sample data already in the repository but was not able to --- if you can provide a way to reproduce it I can take a look and hopefully get to the bottom of it.

Zburatorul commented 2 years ago

I have 3 separate ofx dicts defined.

You should inspect the code in ofx.py and see under what conditions InvalidSourceReference objects are created.

Zburatorul commented 2 years ago

@Zburatorul For the LinkBasedSource issue, the solution is to use a different link prefix for each source so that they can be distinguished.

Ah yes, thanks.

dppdppd commented 2 years ago

I found one situation in which the error appears: if you have multiple accounts of the same LinkBasedSource type. For example, if you have two PayPal accounts. Then, on this line all entries in the ledger are used and one such entry contains some particular link X. But then on this line the links in the ledger will be compared to the links derived from files contained in the data folder of one source at a time. One of them is guaranteed to not contain link X.

This can be avoided by having a single source of type beancount_import.source.paypal, but then they'd share the same assets_account and import incorrectly.

This was the issue. I had 3 distinct ofx sources.

The problem goes away if I combine all 3 file lists together and pass them to a single ofx source.

I reloaded my original transactions.beans with thousands of txns and there are 0 invalid entries.

I'm assuming that the account data will still correctly match up with the accounts in each ofx, but I haven't tested past this point.

jbms commented 2 years ago

Yes for the ofx source it is expected that you have just a single source with all of your input files.

dppdppd commented 2 years ago

Makes sense. I now see that the ofx.py comments suggest putting multiple institutions in the same source. If it doesn't mention it elsewhere already, you might consider adding a note that having more than one ofx source will result in issues.

Thanks!

catanzaromj commented 2 years ago

I'm having the same issue but with two different sources. I'm converting my accounting over from mint to beancount, so I used the mint source with beancount to create transactions; all went great. Moving forward, I'm using the generic importer source to include new transactions. For any account I include with the generic importer (e.g. a specific credit card), the transactions already imported previously made with that card now get flagged as invalid with precisely the same error ("1 occurrences more than expected."). The solution above of combining the sources into the same dict discussed above is not applicable. Any thoughts?

moritzj29 commented 2 years ago

I comment on this closed issue since its the only related information I could find on the topic:

Similarly to @catanzaromj I have two import sources for the same account:

Background

I moved everything to beancount and I import the csv files from my bank. But they only provide the data for a few months back. In order to also include older data, I exported the transactions from the Excel sheet I used to track my finances before beancount. This data I imported in beancount again. So far so good.

Problem

Now I get the 1 occurrences more than expected warning, since the old transactions are not found by my currently used importer.

My workaround

After importing all the data, I remove the importer for the old data from my beancount-import data_sources. I still get the warning for all old transactions, since my current importers do not find the data. I remove the source_desc metadata field for all the old transactions, this makes beancount-import think the transactions are just 'uncleared' (search and replace with regex is you friend here). The new importer does not try to find them anymore.

To manually mark the transactions as cleared beancount-import offers various options given in https://github.com/jbms/beancount-import#checking-for-uncleared-postings. Very handy is the cleared_before: <date> metadata field for the account.

After that, no errors/warnings anymore and all data in my beancount journal :)

Of course this only works if you change from one datasource to another at a certain point in time. It does not allow to have multiple datasources for the same account at the same time, as discussed in this issue above.

I hope somebody else will find this useful...