jseutter / ofxparse

Ofx file format parser for Python
http://sites.google.com/site/ofxparse/
MIT License
204 stars 121 forks source link

Dies on attempt to parse ofx 2.11 documents with <?xml declaration #94

Closed talwrii closed 8 years ago

talwrii commented 8 years ago

I suspect this is beautiful soup being rubbish.

Here is my version of beautiful soup

python -c 'import BeautifulSoup; print BeautifulSoup.__version__'
3.2.1

Here's a patch that "fixes" the issue, but I don't fully understand what's going on. It also illustrates that beautiful soup is getting fed the whole document.

--- a/ofxparse/ofxparse.py
+++ b/ofxparse/ofxparse.py
@@ -191,8 +191,12 @@ class OfxPreprocessedFile(OfxFile):
                 tag_name = re.findall(r'(?i)<([a-z0-9_\.]+)>', token)[0]
                 if tag_name.upper() not in closing_tags:
                     last_open_tag = tag_name
-            new_fh.write(token)
+
+            if not is_processing_tag:
+                new_fh.write(token)
+
         new_fh.seek(0)
+        print new_fh.getvalue()
         self.fh = new_fh

Here is a sanitized document that exhibits the behaviour

<?xml version="1.0" encoding="US-ASCII"?>
<?OFX OFXHEADER="200" VERSION="200" SECURITY="NONE" OLDFILEUID="NONE" NEWFILEUID="NONE"?>
<!-- Converted from: QIF -->
<!-- Date format was: DD/MM/YY -->
<OFX>
  <SIGNONMSGSRSV1>
    <SONRS>
      <STATUS>
        <CODE>0</CODE>
        <SEVERITY>INFO</SEVERITY>
        <MESSAGE>SUCCESS</MESSAGE>
      </STATUS>
      <DTSERVER>20151230</DTSERVER>
      <LANGUAGE>ENG</LANGUAGE>
      <FI>
        <ORG>UNKNOWN</ORG>
        <FID>UNKNOWN</FID>
      </FI>
    </SONRS>
  </SIGNONMSGSRSV1>
  <CREDITCARDMSGSRSV1>
    <CCSTMTTRNRS>
      <TRNUID>0</TRNUID>
      <STATUS>
        <CODE>0</CODE>
        <SEVERITY>INFO</SEVERITY>
        <MESSAGE>SUCCESS</MESSAGE>
      </STATUS>
      <CCSTMTRS>
        <CURDEF>USD</CURDEF>
        <CCACCTFROM>
          <ACCTID>UNKNOWN</ACCTID>
        </CCACCTFROM>
        <BANKTRANLIST>
          <DTSTART>20151203</DTSTART>
          <DTEND>20151230</DTEND>
          <STMTTRN>
            <TRNTYPE>DEBIT</TRNTYPE>
            <DTPOSTED>20151230</DTPOSTED>
            <TRNAMT>-3.45</TRNAMT>
            <FITID>UNKNOWN-CREDITCARD-20151230-3--3.45</FITID>
            <NAME>TESCO-STORES 2610</NAME>
          </STMTTRN>
        </BANKTRANLIST>
        <LEDGERBAL>
          <BALAMT>UNKNOWN</BALAMT>
          <DTASOF>20151230</DTASOF>
        </LEDGERBAL>
        <AVAILBAL>
          <BALAMT>UNKNOWN</BALAMT>
          <DTASOF>20151230</DTASOF>
        </AVAILBAL>
      </CCSTMTRS>
    </CCSTMTTRNRS>
  </CREDITCARDMSGSRSV1>
</OFX>
jaraco commented 8 years ago

I'm pretty sure this is a duplicate of #92.