This file can't be parsed whether read as text or as a binary. Notice the "č" character in NAME. I'm on linux, the default locale is utf-8 and the file was stored as such:
Python 3.10.4 (main, Apr 2 2022, 09:04:19) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ofxparse
>>> file = open('/tmp/moj.ofx') # passing encoding="utf-8" doesn't change anything, as expected
>>> ofx = ofxparse.OfxParser.parse(file)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3/dist-packages/ofxparse/ofxparse.py", line 396, in parse
ofx_file = OfxPreprocessedFile(file_handle)
File "/usr/lib/python3/dist-packages/ofxparse/ofxparse.py", line 155, in __init__
super(OfxPreprocessedFile, self).__init__(fh)
File "/usr/lib/python3/dist-packages/ofxparse/ofxparse.py", line 79, in __init__
self.fh = six.BytesIO(six.b(self.fh.read()))
File "/usr/lib/python3/dist-packages/six.py", line 644, in b
return s.encode("latin-1")
UnicodeEncodeError: 'latin-1' codec can't encode character '\u010d' in position 751: ordinal not in range(256)
Binary mode to skip this error:
>>> file = open('/tmp/moj.ofx', mode="rb")
>>> import ofxparse
>>> ofx = ofxparse.OfxParser.parse(file)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3/dist-packages/ofxparse/ofxparse.py", line 396, in parse
ofx_file = OfxPreprocessedFile(file_handle)
File "/usr/lib/python3/dist-packages/ofxparse/ofxparse.py", line 160, in __init__
ofx_string = self.fh.read()
File "/usr/lib/python3.10/codecs.py", line 504, in read
newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 751: ordinal not in range(128)
It insists on encoding as ascii or latin1. From a quick glance I don't see any of the tests using unicode, so this has likely been broken from the start.
This file can't be parsed whether read as text or as a binary. Notice the "č" character in NAME. I'm on linux, the default locale is utf-8 and the file was stored as such:
Default reading as suggested by docs
Binary mode to skip this error:
It insists on encoding as ascii or latin1. From a quick glance I don't see any of the tests using unicode, so this has likely been broken from the start.