manusimidt / py-xbrl

Python-based parser for parsing XBRL and iXBRL files
https://py-xbrl.readthedocs.io/en/latest/
GNU General Public License v3.0
99 stars 36 forks source link

Double ixbrl fillings #64

Open mrx23dot opened 2 years ago

mrx23dot commented 2 years ago

Filling has two ixbrl entries, but only the secondary ixbrl carries data: index: https://www.sec.gov/Archives/edgar/data/0000944745/000156459021013168/0001564590-21-013168-index.htm

If I try to parse the main one: https://www.sec.gov/Archives/edgar/data/944745/000156459021013168/civb-20201231.htm

It says:

    inst = XbrlParser(cache).parse_instance(url)
  File "C:\python36\lib\site-packages\xbrl\instance.py", line 653, in parse_instance
    return parse_ixbrl_url(url, self.cache)
  File "C:\python36\lib\site-packages\xbrl\instance.py", line 363, in parse_ixbrl_url
    return parse_ixbrl(instance_path, cache, instance_url)
  File "C:\python36\lib\site-packages\xbrl\instance.py", line 404, in parse_ixbrl
    if xbrl_resources is None: raise InstanceParseException('Could not find xbrl resources in file')
xbrl.InstanceParseException: Could not find xbrl resources in file

As pointed out the SEC extracted xml already merged the 2 files together, but unfortunately it's not in the SEC zip file.

Shouldn't the lib find the secondary ixbrl, when I provide the main ixbrl? This is the only reference to secondary file in the main one:

<p style="margin-bottom:8pt;margin-top:0pt;margin-left:0pt;;text-indent:0pt;;font-size:9.5pt;font-family:Times New Roman;font-weight:normal;font-style:normal;text-transform:none;font-variant: normal;"><a href="civb-ex131_7.htm">
<span style="text-decoration:none;">Statement regarding earnings per share</span>
</a>

Another one: https://www.sec.gov/Archives/edgar/data/0000021076/000002107621000016/0000021076-21-000016-index.htm

jonkatz6 commented 2 years ago

The primary xml file is in the data files table with the type: XML and description: EXTRACTED XBRL INSTANCE DOCUMENT.

url for the primary xml file is : https://www.sec.gov/Archives/edgar/data/944745/000156459021013168/civb-20201231_htm.xml

i've found the xml files to be the most reliable to parse and so typically use them as opposed to the inline xbrl html files.

mrx23dot commented 2 years ago

Unfortunately SEC doesn't provide the parsed xml if the filling has htm ixbrl in it, in the zip download. see https://www.sec.gov/Archives/edgar/data/0000944745/000156459021013168/0001564590-21-013168-xbrl.zip