Closed wadabum closed 4 years ago
There isn't any existing way to do this, but some sort of convergence with beancount.ingest is something I'm very interested in exploring. However, there are some challenges to overcome:
beancount.ingest is based on generating partial transactions (that don't necessarily balance) with missing postings. beancount-import requires full transactions, with Expenses:FIXME postings to indicate unknown accounts (a single Expenses:FIXME posting is allowed to correspond to multiple unknown postings). This is simple to handle, though --- just generate Expenses:FIXME postings to balance the transaction.
With just step 1, you could have a partial integration with beancount-import, but to get the automatic tracking of which transactions have already been imported, and automatic classification of unknown accounts, the beancount-import data source needs to provide more information than is available in the beancount.ingest model.
How about exporting this required info in meta? I am currently exporting transaction ref/ cheque No. and jumbled description into meta. ML models that can use this to predict postings and use your awesome UI for manual corrections would be great. currently exploring smart_importer but it doesn't have a way to fix interactively like this project. I too want this to work.
If you specified which metadata fields indicate a unique transaction id, and which metadata fields to use for automatic classification, then I think it would work.
Do you have any plans for defining API for the UI part?
I am currently using smart_importer and I think having an interactive UI makes a lot of difference. Having a clearly defined API would mean making UI compatible with all the future ML models. My statements have a lot of numbers and plain text and comments and I would surely love to try some simple NLP models like Spacy's classifier.
I personally loved the idea of being able to make some manual entries and later merge them effortlessly with the statement.
I guess you want to use the beancount-import UI with the account prediction provided by smart_importer and ingesters defined using the benacount.ingest framework?
I don't have plans to try to make that work myself, but if you're interested in making that happen, it does sound like it could be very useful, and I'm happy to offer advice and would be interested in merging it into beancount-import, for example.
The datasource API provided by beancount-import is already sufficient, I think, for integrating with beancount.ingest. Currently there is no API for plugging in a different account prediction mechanism, but that could be added.
Or maybe it would make sense for beancount-import to merge into smart_importer entirely, but not sure if the smart_importer authors are interested.
Correct me if I'm wrong, but it sounds like the stumbling block of this program (ingester->smart_importer->beancount-import UI) is that the prediction/ML part needs to be factored out in beancount-import? Is there any other issue related to matching existing and currently imported transactions? It looks to me like smart_importer is capable of detecting duplication.
Is it possible then to get the smart_importer to do all the deduplication and account prediction, while simply turning off this functionality in beancount-import?
Your posts from two years ago seem relevant: https://www.mail-archive.com/beancount@googlegroups.com/msg01628.html https://www.mail-archive.com/beancount@googlegroups.com/msg01630.html
Currently beancount-import relies on an account name starting with Expenses:FIXME to indicate an "unknown" account, which has 3 effects:
Individual beancount-import datasources are already responsible for doing deduplication, so it is no problem for smart_importer to do that. To let smart_importer perform the automatic prediction, while still preserving effects (1) and (3), beancount-import could be modified to support an extra metadata key on postings that would indicate the predicted account while still allowing the account to be considered "unknown".
This would be a pretty simple change to make, e.g. in _get_unknown_account_predictions in reconcile.py.
@dumbPy, are you interested in trying to get your smart_importer to work with beancount-import along the lines of #60 ? You'd have to make it add two meta field to each predicted posting.
@jbms Individual beancount-import datasources are already responsible for doing deduplication
Is this still accurate?
If I understand correctly, that means an ImporterProtocol
subclass needs to confirm each transaction it generates isn't a duplicate of an existing entry in the transactions.beancount
file?
The generic importer handles deduplication based on (account, date, units, narration)
, so individual ImporterProtocol subclasses should not need to do that.
Hmm is there an option (aside from subclassing the source) to avoid requiring the narration match? Usually I tidy these up in the import UI, so what's in the transactions file is more concise and/or better formatted than the raw importer output
There isn't currently any such option, but I agree that matching on narration is problematic. Potentially you could match on just date, account, and amount alone. But it would be better to store the original narration as a metadata field, so that it can be used for training and for deduplication
Yes currently I'm storing the original in the source_desc
metadata field. I think it'd make more sense to use that as part of the deduplication key than the final narration
So, I´ve spend hours writing a bean-extract compatible parser using
beancount.ingest.importer.ImporterProtocol
Can one somehow string those things together?
something along the lines of the usual:
or however that´s gonna map out... ;-)
(think that could proxy-close #18 as well...)