redstreet / beancount_reds_importers

Simple ingesting tools for Beancount (plain text, double entry accounting software). More importantly, a framework to allow you to easily write your own importers.
GNU General Public License v3.0
115 stars 39 forks source link

pdfreader and bamboohr paycheck importer #94

Closed a0js closed 7 months ago

a0js commented 8 months ago

Purpose

Many paycheck downloads only come in the form of pdf. To support this use case, and potentially other use cases, we need to add support for pdf parsing. As a proof of concept, I also added the bambooHR paystub importer.

Approach

Outstanding Issues

Resolves #93

redstreet commented 8 months ago

Looks great, and I'd be happy to take this in. Thanks for the helpful comments and the debug option too.

I completely understand the challenge in adding any kind of examples at all to importers since by definition they're all personal data. But I'm wondering if perhaps adding an example pdf that you even made up in any word processing program that is importable by this? I just want to ensure we have some way down the line to test code to ensure it's still alive.

It doesn't have to use the bamboohr importer (though that'd be great too). Perhaps a simple single table made in MS-Word or the likes, and a simple dummy importer for it?

redstreet commented 8 months ago

For dependencies, the following should work:

pip3 install pigar
pigar generate -c '>='
ranebrown commented 8 months ago

I think the dependencies would just need added here

a0js commented 8 months ago

Apologies for not getting back to this. I'm hoping to make the changes and add a test before the end of the week.

a0js commented 8 months ago

I added a generic pdf paycheck importer that really isn't meant to be used but to show how to make a paycheck importer using the pdfreader. Tests are included and passing.

redstreet commented 8 months ago

This looks great thank you, and thanks for the test! Looks like the package build fails. If you could please take a look and fix that, we can get this merged in.

a0js commented 8 months ago

I think the dependencies would just need added here

I forgot to do this. I'll update it tomorrow!

a0js commented 7 months ago

I wasn't able to get to this before I went off grid for a week. I'll try and update in the next few days.

a0js commented 7 months ago

@redstreet Not sure why, but I could not get pigar to find all the dependencies necessary, so I just manually updated the requirements.txt file. That should fix the tests.

redstreet commented 7 months ago

Thanks, no idea why pigar fails, but updating requirements.txt is fine.

The formatting is still failing, see above. If you could fix this, we can get this in. https://github.com/redstreet/beancount_reds_importers/blob/main/CONTRIBUTING.md shows what to run to fix formatting.

a0js commented 7 months ago

Not sure why, but when I ran the formatting commands locally I got 64 changes instead of 3 as observed in the github action output. I'm working on a mac and for some reason the default ruff settings are different. I ended up running a docker image that copied the github action environment and ran the format commands inside docker and that got the correct formatting changes.

Thanks for your patience, by the way.

redstreet commented 7 months ago

@a0js, curious, is there a button to run the formatting workflow that appears for you in this PR? I made a commit yesterday to enable this and am wondering if it works.

redstreet commented 7 months ago

Not sure why, but when I ran the formatting commands locally I got 64 changes instead of 3 as observed in the github action output. I'm working on a mac and for some reason the default ruff settings are different. I ended up running a docker image that copied the github action environment and ran the format commands inside docker and that got the correct formatting changes.

Thanks for reporting. Hmm, not sure why you're seeing this. Ruff uses settings frm pyproject.toml that's in the repo. So if you're running from the repo root and that file is present, it should come from there. Perhaps try a ruff --version, and a pip install ruff --ugprade if needed?

Thanks for your patience, by the way.

Of course, no worries at all, thank you for sticking with this PR and getting it in, much appreciated!

a0js commented 7 months ago

@a0js, curious, is there a button to run the formatting workflow that appears for you in this PR? I made a commit yesterday to enable this and am wondering if it works.

I don't see anything on the PR page, but I could be looking in the wrong place. Where should it be?

redstreet commented 7 months ago

I don't see anything on the PR page, but I could be looking in the wrong place. Where should it be?

I'd expect it to be on the bottom of the PR page. I'm sure it'd be an obvious green button, so if you haven't seen it, it must've not worked. Anyway, it doesn't matter much for this PR, thanks for checking!

Do let me know if you need help with any of the outstanding things.

a0js commented 7 months ago

I can't merge this myself as I don't have write access. If you think this is good to go, can you merge it in?

redstreet commented 7 months ago

I think it was failing checks, but the checks are not running now for some reason. Let me take a look.

redstreet commented 7 months ago

Checks are passing for me locally. I don't know why github wouldn't run them. Either way, merged!

Thank you again for the contribution, and for working to get this PR through! IMO, table extraction from pdfs is a solid contribution for beancount_reds_importers as it's still fairly common to find that pdfs are the only option (no csv/ofx). So this is great!

a0js commented 7 months ago

You're most welcome! I'm glad I could add something to this awesome project. Let me know if there are some other features I can help with.

a0js commented 6 months ago

Sorry to comment on the old PR, but I was just curious how often you cut releases and when this one might be published?

redstreet commented 6 months ago

Np at all. I usually put it through personal use of at least a few weeks before I publish, so bugs have a chance to surface. Let me take a look this evening to see if I can cut a release.

redstreet commented 6 months ago

Released 0.9.0, featuring this PR :-)