oaubert / banquepostale-pdf-to-tsv

Convert PDF bank statements from La Banque Postale to usable text/tsv data
GNU General Public License v3.0
11 stars 3 forks source link

Design principles #3

Open lalebarde opened 9 months ago

lalebarde commented 9 months ago

Hi, could you please explain the design principles in order to ease extending it to other banks please? I need LCL. By the way, if I naively run your program on a LCL pdf, I get the following error:

Syntax Warning: FoFiType1::parse a line has more than 255 characters, we don't support this

oaubert commented 9 months ago

There is not much that is generic, apart maybe from the Record class that holds records. The code of the program is heavily regexp-based parsing of the output of pdftotext -layout applied to the pdf of the bank, and there is no standard here. So the "design principle" could be: apply pdftotext to your pdf, determine regexps for identifying things (start/end of accounting information, data lines - with possible continuation lines, etc) and then try to fine-tune it to match all files (which may vary with time even in the same bank).

Bartel-C8 commented 9 months ago

@lalebarde , sorry to hijack your Issue here, but the 0MQ community was trying to reach you quite some time ago, to no avail... (But I assume your free.fr mail address is just not in the works any more)

It's about granting to use your proxy_steerable code to relicense it in the repo. It was removed here: https://github.com/zeromq/libzmq/pull/4554/commits/13bc1de42149ad8dfca7847ffa56b331dcd6a379 . But currently it's a blocker to create a new libzmq release...

https://lists.zeromq.org/pipermail/zeromq-dev/2023-July/033851.html

Thanks for your consideration (in name of the community)

Original question was posed here: https://github.com/zeromq/libzmq/pull/891