grindsa / dkb-robo

library to access the internet banking area of "Deutsche Kreditbank" to get account information and transactions
GNU General Public License v3.0
146 stars 28 forks source link

Filename issues when downloading documents with scan_postbox #32

Closed megamorf closed 2 years ago

megamorf commented 2 years ago

Hey, thanks for this very useful library. I used it for the very first time today and one of my primary use cases is to grab new documents from the postbox.

I encountered the following issues with the downloaded documents:

Encoding problems

Steuerbescheinigungen

 Length Name
 ------ ----
1436671 Ertr%c3%a4gnisaufstellung_2021_redacted.pdf

Vertragsinformationen

 Length Name
 ------ ----
 985260 %c3%84nderungsangebot_zum_01.01.2023_-_Vermieterpaket.pdf
4720661 %c3%84nderungsangebot_zum_01.01.2023.pdf
  12718 Mitteilung_%c3%bcber_steigende_Sollzinss%c3%a4tze_ab_01.10.2022.pdf

Wertpapierdokumente

Length Name
------ ----
 56873 allgemeine_Anschreiben_Kapitalma%c3%9fnahmen_vom_05.09.2022_zu_Depot_redacted_-_51567359221656A2QK20.pdf
 56804 allgemeine_Anschreiben_Kapitalma%c3%9fnahmen_vom_28.09.2022_zu_Depot_redacted_-_51572762221324A2QK20.pdf
 48144 Kauf_-_WKN_A2PKXG_-_Auftragsbest%c3%a4tigung_vom_17.06.2022_zu_Depot_redacted_-_Ordernr._6487281200.pdf
 48143 Kauf_-_WKN_A2PKXG_-_Auftragsbest%c3%a4tigung_vom_17.06.2022_zu_Depot_redacted_-_Ordernr._6487377900.pdf
 47735 Kauf_-_WKN_A2PKXG_-_Streichungsbest%c3%a4tigung_vom_31.08.2022_zu_Depot_redacted_-_Ordernr._6487281200.pdf
 47737 Kauf_-_WKN_A2PKXG_-_Streichungsbest%c3%a4tigung_vom_31.08.2022_zu_Depot_redacted_-_Ordernr._6487377900.pdf
 48301 Kauf_-_WKN_A2QK20_-_Auftragsbest%c3%a4tigung_vom_27.06.2022_zu_Depot_redacted_-_Ordernr._6585736100.pdf
 48443 Kauf_-_WKN_A2QK20_-_Auftragsbest%c3%a4tigung_vom_28.06.2022_zu_Depot_redacted_-_Ordernr._6585736100.pdf
 48754 Kauf_-_WKN_A2QK20_-_Ausf%c3%bchrungsanzeige_vom_29.06.2022_zu_Depot_redacted_-_Ordernr._6585736101.pdf
 58119 Kosteninformation_f%c3%bcr_das_Jahr_2022_zu_Depot_redacted.pdf

Zero length files / missing file extension

Length Name
------ ----
     0 Kosteninformation_zu_Wertpapier_ROMEO_POWER_INC._REG._SHARES_CL.A_DL_-_0001_vom_27.06.2022__21
     0 Kosteninformation_zu_Wertpapier_VANGUARD_FTSE_ALL-WORLD_U.ETF_REG._SHS_USD_ACC._ON_vom_17.06.2022__01

Edit: the problematic files have the following names in the postbox and this is the corresponding download link:

Name: Kosteninformation zu Wertpapier ROMEO POWER INC. REG. SHARES CL.A DL -,0001 vom 27.06.2022, 21:23 zu Depot redacted
Link: https://www.dkb.de/DkbTransactionBanking/content/mailbox/MessageList.xhtml?$event=getMailboxAttachment&filename=Kosteninformation+zu+Wertpapier+ROMEO+POWER+INC.+REG.+SHARES+CL.A++DL+-%2C0001+vom+27.06.2022%2C+21%3A23+zu+Depot+redacted&row=15
Name: Kosteninformation zu Wertpapier VANGUARD FTSE ALL-WORLD U.ETF REG. SHS USD ACC. ON vom 17.06.2022, 01:06 zu Depot redacted
Link: https://www.dkb.de/DkbTransactionBanking/content/mailbox/MessageList.xhtml?$event=getMailboxAttachment&filename=Kosteninformation+zu+Wertpapier+VANGUARD+FTSE+ALL-WORLD+U.ETF+REG.+SHS+USD+ACC.+ON+vom+17.06.2022%2C+01%3A06+zu+Depot+redacted&row=19

Let me know how I can help you get these two issues resolved :-)

grindsa commented 2 years ago

Hi, I pushed a fix into the master branch which should address the umlaut encoding. The "zero lengh" issue is a bid trickier.

megamorf commented 2 years ago

I reran the download based on master 7b49fa61893d0dc6084b3aa2898f457cca0de9c5 and it not only fixed the umlaut encoding problem but also seems to have fixed the zero length document problem.

The only difference this time is that I had to run dkb.scan_postbox(PATH, download_all=True) since all the documents from the previous run had been marked as read and there's no way to mark them as unread again.

grindsa commented 2 years ago

Thank you for your response. Good that the zero-file size problem seems to be addressed as well (even if I do not fully understand why). The change made into v.18. Thus, I am closing this issue.