manusimidt / py-xbrl

Python-based parser for parsing XBRL and iXBRL files
https://py-xbrl.readthedocs.io/en/latest/
GNU General Public License v3.0
100 stars 37 forks source link

Submissions from CompSci #30

Closed Pablompg closed 3 years ago

Pablompg commented 3 years ago

When trying to parse this submission: https://www.sec.gov/Archives/edgar/data/747540/000121390021011934/sprs-20201130.xml the library failed with the error:

Traceback (most recent call last): File "/home/pablo/.pyenv/versions/3.7.10/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/pablo/.pyenv/versions/3.7.10/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/pablo/Desktop/repos/data/etls/providers/raw_sec1_extract_fundamentals/src/raw_sec1_extract_fundamentals/main.py", line 23, in download_files(config, os.getenv("DOWNLOAD_FILES", "download.files")) File "/home/pablo/.local/share/virtualenvs/raw_sec1_extract_fundamentals-70AQCo8K/lib/python3.7/site-packages/data_components/config/logging.py", line 229, in wrapped_method return method(*args, *kwargs) File "/home/pablo/.local/share/virtualenvs/raw_sec1_extract_fundamentals-70AQCo8K/lib/python3.7/site-packages/data_components/config/logging.py", line 278, in wrapped_method ans = method(args, **kwargs) File "/home/pablo/Desktop/repos/data/etls/providers/raw_sec1_extract_fundamentals/src/raw_sec1_extract_fundamentals/main.py", line 15, in download_files SecDownload(config, config_key).run() File "/home/pablo/Desktop/repos/data/etls/providers/raw_sec1_extract_fundamentals/src/raw_sec1_extract_fundamentals/sec1_task.py", line 64, in run xbrlParser.parse_instance(url) File "/home/pablo/.local/share/virtualenvs/raw_sec1_extract_fundamentals-70AQCo8K/lib/python3.7/site-packages/xbrl/instance.py", line 604, in parse_instance return parse_xbrl_url(url, self.cache) File "/home/pablo/.local/share/virtualenvs/raw_sec1_extract_fundamentals-70AQCo8K/lib/python3.7/site-packages/xbrl/instance.py", line 256, in parse_xbrl_url return parse_xbrl(instance_path, cache, instance_url) File "/home/pablo/.local/share/virtualenvs/raw_sec1_extract_fundamentals-70AQCo8K/lib/python3.7/site-packages/xbrl/instance.py", line 288, in parse_xbrl context_dir = _parse_context_elements(root.findall('xbrli:context', NAME_SPACES), root.attrib['ns_map'], taxonomy, cache) File "/home/pablo/.local/share/virtualenvs/raw_sec1_extract_fundamentals-70AQCo8K/lib/python3.7/site-packages/xbrl/instance.py", line 540, in _parse_context_elements member_concept: Concept = member_tax.concepts[member_tax.name_id_map[member_concept_name]] AttributeError: 'NoneType' object has no attribute 'concepts'

I will have a deeper look at the error but it seems that this submission does not contain an id for each fact. Will have to analyse it and see if it can be parsed or not.

Pablompg commented 3 years ago

It is also failing for this submissions:

This submissions were created with: Toppan Merrill Bridge iXBRL 9.6.7713.40453 and not workviva. Looks like the library fails to parse some submissions from this XBRL creator.

manusimidt commented 3 years ago

Thank you! I will check this evening if i can find the issue.

manusimidt commented 3 years ago

The issue is that the parser fails when parsing the context because the taxonomy could not be found. However it should throw a "TaxonomyNotFound" Exception no AttributeError. Furthermore i will check why the taxonomy could not be imported in this case.

The context:

<context id="c77_From1Dec2022To31Dec2022_HK_SubsequentEventMember">
  <entity>
    <identifier scheme="http://www.sec.gov/CIK">0000747540</identifier>
    <segment>
      <xbrldi:explicitMember dimension="us-gaap:DeferredRevenueArrangementTypeAxis">**pf0:HK**</xbrldi:explicitMember>
      <xbrldi:explicitMember dimension="us-gaap:SubsequentEventTypeAxis">us-gaap:SubsequentEventMember</xbrldi:explicitMember>
    </segment>
  </entity>
  <period>
    <startDate>2022-12-01</startDate>
    <endDate>2022-12-31</endDate>
  </period>
</context>

The namespace:

xmlns:pf0="http://xbrl.sec.gov/country/2020-01-31"

In this particular case the schema for the namespace was not defined, but it is a standard taxonomy and should be added to https://github.com/manusimidt/xbrl_parser/blob/34952a6a1185a81d767c591d3a42f53423b558d5/xbrl/taxonomy.py#L160-L188

manusimidt commented 3 years ago

@Pablompg Thank you for the issue! The error occurred due to a small copy-and-paste error in the code

Pablompg commented 3 years ago

Thank you for solving the issue @manusimidt . Would you consider a new release with this issue solved?

manusimidt commented 3 years ago

Yes of course, sorry i forgot.