Future for Importers with V3

Servinjesus1 commented 5 months ago

Martin has released "v3" of beancount, which is just the refactored pythonic bits, as the default version. With this, things like bean-extract and bean-identify are finally sunset, replaced with beangulp. What are your thoughts on adapting your importers to this updated framework?

For starters, beancount.ingest can probably become beangulp, although I can't find the regression pytest. Then, shell scripts have to become python scripts, I guess. Probably other work to ensure your importer frameworks remain compatible with the modern beangulp importer.

As someone who was just starting to get the hang of your scripts and was writing my first importer when this change happened, it seems overwhelming to have to adapt to a new framework, so I'd greatly appreciate some expert advice on how to use the new framework :)

redstreet commented 5 months ago

Hello there, Beangulp is at its core exactly the same as bean-identify/extract/etc., and some 90-98% of the code remains unchanged. It should be pretty easy to convert beancount_reds_importers to use beangulp. I started to do this in 2022, but the beangulp maintainer was not willing to release a PyPI then, so I couldn't. I'll certainly do it (or perhaps someone else will send a PR), but am unlikely have the time to get to it for at least a few weeks, perhaps longer.

Until then, I suggest using Beancount v2. There's no significant difference between that and v3 for most users, and upgrading to v3+beancount_reds_importers when it uses beangulp would be trivial.

Does that help?

patbakdev commented 5 months ago

I have been using Red's importers with beangulp since 2022 without any problem. I moved over when he tried to move over. I have used the following approach.

This code was pulled from beangulp I think:

class Adapter(beangulp.Importer):
    @property
    def name(self):
        return self.importer.name()

    def identify(self, filepath):
        return self.importer.identify(beangulp.cache.get_file(filepath))

    def account(self, filepath):
        return self.importer.file_account(beangulp.cache.get_file(filepath))

    def file_account(self, file):
        return self.importer.file_account(beangulp.cache.get_file(file))

    def date(self, filepath):
        return self.importer.file_date(beangulp.cache.get_file(filepath))

    def filename(self, filepath):
        filename = self.importer.file_name(beangulp.cache.get_file(filepath))
        # The current implementation of the archive command does not
        # modify the filename returned by the importer. This preserves
        # backward compatibility with the old implementation of the
        # command.
        return misc_utils.idify(filename) if filename else None

    def extract(self, filepath, existing_entries):
        p = inspect.signature(self.importer.extract).parameters
        if len(p) > 1:
            return self.importer.extract(beangulp.cache.get_file(filepath), existing_entries)
        return self.importer.extract(beangulp.cache.get_file(filepath))

and then I create wrappers like this:

""" My BECU Importer Wrapper """
from beancount_reds_importers.importers import becu
import Adapter

class Importer(Adapter):
    def __init__(self, config):
        self.importer = _Importer(config)

class _Importer(becu.Importer):
    def custom_init(self):
        super().custom_init()

I call it from my own tool like this:

from beangulp import Ingest

...

@click.group("import")
@click.version_option()
@click.pass_context
def importer(ctx):
    """
    Import Data (Beangulp)
    """
    loader = importlib.machinery.SourceFileLoader("config", os.path.join(os.getcwd(), "Import.cfg"))
    spec = importlib.util.spec_from_loader("config", loader)
    config = importlib.util.module_from_spec(spec)
    loader.exec_module(config)

    hooks = [process_entries]

    ctx.obj = Ingest(config.importers, hooks)

It will be nice to pull all that out eventually. If you are in a hurry you could do the same ... or just wait.

redstreet commented 5 months ago

Oh very cool, thank you for sharing, @patbakdev!

I haven't looked through it all, but if you want to turn it into a PR at some point, that'd be most welcome.

patbakdev commented 4 months ago

Not sure what you want to do here. This code was just about making v2 importers work with beangulp by adapting the interface until you were ready to move to beangulp for those that don't want to wait. But I suspect you will want to move the importers to use the beangulp interface instead. I guess the question is whether or not you want to have some backwards compatibility with V2. In which case we might be able to invert this code to work the other way around. But that seems like a design decision. Personally I would just freeze V2 in a branch and move to V3 instead of having bankv2 and bankv3 importers. Maybe there is a smart way to do this so nobody has to update anything (at one point I think I read that beangulp was backwards compatible).

redstreet commented 4 months ago

Sorry I wasn't being clear. You're right: the plan is to freeze v2 and move on to v3. However, it's going to be several weeks or longer before I have any time to do it. Until then, I was thinking of temoprarily providing an adapter such as yours for folks wanting to adopt Beancount v3 right away.

On further thought, providing an adapter "officially" may not be worth the effort it takes to test it, provide examples, and such. Your code is already in this thread, and that should be helpful for anyone wanting it. Thoughts welcome if I'm missing anything. Thank you for engaging in this conversation!

patbakdev commented 4 months ago

Ok. Yes, we are on the same page then. I'm also subscribed to the thread in case anyone runs into trouble adapting the code to their needs in the mean time.

jodoox commented 4 months ago

Not sure how to implement the adapter workaround for a custom importer relying on the csv reader: from beancount_reds_importers.libreader import csvreader fails (as expected for v3) with ModuleNotFoundError: No module named 'beancount.ingest'. I don't see how this could work without patching the csvreader file?

EDIT: all good, got it- I thought beangulp was v3 only

patbakdev commented 4 months ago

Glad you figured it out. All of my banking/investment importers are still using OFX so I don't have any experience with CSV. Although I am not sure why that would matter. I did attempt to switch from OFX to CSV for Fidelity once upon a time by creating an importer and if I look through that (dead) code it looks pretty much the same.

from beancount_reds_importers.libreader import csvreader
from beancount_reds_importers.libtransactionbuilder import investments
from MY_CODE import Adapter

class Importer(Adapter):
    def __init__(self, config):
        self.importer = _Importer(config)

class _Importer(csvreader.Importer, investments.Importer):
    IMPORTER_NAME = 'Fidelity Brokerage CSV'
   ...

So I would think one would just need to do the same for an existing importer.

class _Importer(fidelity_csv.Importer, investments.Importer):
    def custom_init(self):
        super().custom_init()

Anyways, hope that helps.

blalor commented 1 month ago

I took a quick look at converting this project to beangulp and immediately tripped on the removal of the regression testing plugin at https://github.com/beancount/beangulp/commit/d4d921233c179e6c3bed6e95eed132f6d58d2c07

@redstreet have you played with this at all? If not, @dnicolodi do you have any advice on converting importers that did use that framework?

redstreet commented 1 month ago

@blalor I haven't had a chance to look yet. Thanks for the pointer. That commit message says there's a simpler testing framework now. Is that the one you're looking for advice to convert to? If so, I haven't taken a look at it yet.

blalor commented 1 month ago

@blalor I haven't had a chance to look yet. Thanks for the pointer. That commit message says there's a simpler testing framework now. Is that the one you're looking for advice to convert to? If so, I haven't taken a look at it yet

Yeah, that’s what I’m referring to. The tests seem to have been moved into a shell script that calls a test command on the importer instance itself. Feels awkward because it’s not part of pytest, which is why I’m asking for guidance. 😁

dnicolodi commented 1 month ago

I'm not sure I understand the question. There was a piece of code that tried to define some helpers to use pytest to test importers. The code was ugly, hard to use, and made obsolete by new pytest features. Thus it has been removed.

Those who want to keep using pytest for testing their importers, they can continue to do so: they just need to hook up their tests with pytest in the way they prefer. Which functionality of the code that has been removed do you miss?

We found that most importers are easier to test with something much simpler than pytest and that is the simple test subcommand that the importers framework implements. This simply compares the output of the importer with an expected output and provides some not too ugly report. This has the advantage that it does not require to write any code. Somewhere I have some code that allows unittest to discover tests provided by this simple mechanism. I think something similar can be written for pytest but I haven't looked into it.

blalor commented 1 month ago

I'm not sure I understand the question. There was a piece of code that tried to define some helpers to use pytest to test importers. The code was ugly, hard to use, and made obsolete by new pytest features. Thus it has been removed.

Which new pytest features are you referring to?

We found that most importers are easier to test with something much simpler than pytest and that is the simple test subcommand that the importers framework implements. This simply compares the output of the importer with an expected output and provides some not too ugly report. This has the advantage that it does not require to write any code. Somewhere I have some code that allows unittest to discover tests provided by this simple mechanism. I think something similar can be written for pytest but I haven't looked into it.

This is an example of the old regression test fixture: https://github.com/redstreet/beancount_reds_importers/blob/07d972d95e535b0705d52e7ce36a25c40d4babaf/beancount_reds_importers/importers/dcu/tests/dcu_csv_test.py#L5-L20

Is there an example of using the new framework with pytest? If not, do you have any pointers?

redstreet / beancount_reds_importers

Future for Importers with V3 #103