Closed sebnapi closed 7 months ago
Why a new importer? The existing importer can be very easily extended to include the English language. Just add the keywords to the existing regex. Then create the TestCases for verification. You can find the test cases here.
For a example:
private void addBuySellTransaction()
{
DocumentType type = new DocumentType("Stock-Exchange Transaction: (Buy|Sell)");
this.addDocumentTyp(type);
...
Change to
private void addBuySellTransaction()
{
DocumentType type = new DocumentType("(B.rsentransaktion|Stock\\-Exchange Transaction): (Kauf|Verkauf|Buy|Sell)");
this.addDocumentTyp(type);
...
Or
// Zu Ihren Lasten USD 2'900.60
// Zu Ihren Gunsten CHF 8'198.70
// To your debit EUR 2'049.80
.section("currency", "amount")
.match("^To your (debit|credit) (?<currency>[\\w]{3}) (?<amount>[\\.'\\d]+)$")
.assign((t, v) -> {
t.setAmount(asAmount(v.get("amount")));
t.setCurrencyCode(v.get("currency"));
})
to
// Zu Ihren Lasten USD 2'900.60
// Zu Ihren Gunsten CHF 8'198.70
// To your debit EUR 2'049.80
.section("currency", "amount")
.match("^(Zu Ihren|To your) (Lasten|Gunsten|debit|credit) (?<currency>[\\w]{3}) (?<amount>[\\.'\\d]+)$")
.assign((t, v) -> {
t.setAmount(asAmount(v.get("amount")));
t.setCurrencyCode(asCurrencyCode(v.get("currency")));
})
As you can see, only single lines are expanded.
Please remember to escape all special characters like (\.[]{}()<>*+-=!?^$|)
.
Alternatively you can create PDF debug, then we would take care of it. You can see how it works in the video tutorial.
Video tutorial: Extract PDF documents for debugging
Alex :-)
This is certainly possible, but as a software engineer myself I wouldn't go that way. The regular expressions get harder to maintain with each language added. I would use class inheritance for it and have the regular expressions in variables (it depends on your experience of the other extractors what makes the most sense, variables, static variables, or inheritable methods returning strings or regexs):
I will use pseudo code to demonstrate the idea, because Java is too verbose and I don't have an IDE installed currently:
class SwissquotePDFExtractorEn(AbstractPDFExtractor):
protected String regex_buy_sell_transaction = "^Stock-Exchange Transaction: (Buy|Sell) .*$"
protected String regex_dividends_transaction = "^(Dividend|Capital Gain) Our reference:(.*)$"
... (implement methods using variables)
Then only overwrite the variables and be done
class SwissquotePDFExtractorDe(SwissquotePDFExtractorEn):
protected String regex_buy_sell_transaction = "^B.rsentransaktion: (Kauf|Verkauf) .*$"
protected String regex_dividends_transaction = "^(Dividende|Kapitalgewinn) Unsere Referenz:(.*)$"
Then I would use a factory to determine the used extractor or which in your current situation is easier: build a composite pdfextractor, which basically lets the document pass through two or more extractors (so that you don't have menu items for all broker-language-combinations).
class CompositePDFExtractor(AbstractPDFExtractor):
CompositePDFExtractor(AbstractPDFExtractor ...extractors):
...
extract() throws NotApplicable
for(BasePDFExtractor extractor: extractors):
extractor.extract()
I still need help to run the program, what needs to be run after clean verify
? after this I only end up with the folders "name.abuchen.portfolio.[...]"
Is your feature request related to a problem? Please describe. The pdf import for Swissqoute is only in German.
Describe the solution you'd like I'd like to pdf import swissqoute documents in english.
Additional context
I don't know how you would treat this problem. The regular expressions seemed reasonable, but they are in German. So to make it simple, I have just created a second Swissqoute extractor
SwissquotePDFExtractorEn.java
and have added it to thePDFImportAssistant.java
.I tried to run it, but after the
mvn -f portfolio-app/pom.xml clean verify
I can't find a working jar, I'm on a mac. Two regexes are not translated as I didn't came across them. Could you help me with that?