jbms / beancount-import

Web UI for semi-automatically importing external data into beancount
GNU General Public License v2.0
393 stars 103 forks source link

Import of OFX CHECKNUM tag fails for a non-numeric value #44

Open gpaulissen opened 4 years ago

gpaulissen commented 4 years ago

When processing my OFX file it fails with this error:

Traceback (most recent call last): File "C:\Users\gjpau\AppData\Local\Programs\Python\Python38\lib\site-packages\beancount\core\number.py", line 96, in D return Decimal(_CLEAN_NUMBER_RE.sub('', strord)) decimal.InvalidOperation: [<class 'decimal.ConversionSyntax'>]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): ... File "c:\users\gjpau\documents\github\beancount-import\beancount_import\source\ofx.py", line 1394, in prepare state.get_accounts_and_entries() File "c:\users\gjpau\documents\github\beancount-import\beancount_import\source\ofx.py", line 1249, in get_accounts_and_entries statement.get_entries(self) File "c:\users\gjpau\documents\github\beancount-import\beancount_import\source\ofx.py", line 843, in get_entries posting_meta[CHECK_KEY] = D(stripped_checknum) File "C:\Users\gjpau\AppData\Local\Programs\Python\Python38\lib\site-packages\beancount\core\number.py", line 104, in D raise ValueError("Impossible to create Decimal instance from {!s}: {}".format( ValueError: Impossible to create Decimal instance from 8OOMBV: [<class 'decimal.ConversionSyntax'>]

c:\users\gjpau\appdata\local\programs\python\python38\lib\site-packages\beancount\core\number.py(104)D() -> raise ValueError("Impossible to create Decimal instance from {!s}: {}".format(

When I comment all tags the import succeeds.

The failing tag is: 08OOMBV

I have read the latest OFX specification that says a number of format A-12. However the A just means any UTF-8 character.


Check (or other reference) number, A-12 --- Character fields are identified with a data type of “A-n”, where n is the maximum number of allowed Unicode characters. Note: n refers to the number of characters in the resultant string. Each multi-byte or encoded character counts as a single character. UTF-8 encodes “high” Latin-1 characters (decimal 128- 255) using two bytes, and double-byte characters using three bytes. In addition, XML encodes ampersands, less-than symbols, greater-than symbols, and spaces (where required) using multicharacter escape strings (see section 2.3.1.1). Therefore, an element of type A-40 may require more than 40 bytes in a UTF-8-encoded XML stream.
jbms commented 4 years ago

I was using the decimal conversion to strip zero padding, but it looks like the code needs to be modified to treat the check numbers as strings instead, but still strip leading zeros.

Are you able to prepare a patch?

gpaulissen commented 4 years ago

Hello,

It has been a long time ago but I have good news. I have a patch for this and lots more. I keep you posted the next few days. I will create issues for the things I have added.

Kind regards,

Gert-Jan Paulissen