manusimidt / py-xbrl

Python-based parser for parsing XBRL and iXBRL files
https://py-xbrl.readthedocs.io/en/latest/
GNU General Public License v3.0
100 stars 37 forks source link

parsing uk submission "KeyError: 'bus'" #49

Closed shadow111 closed 2 years ago

shadow111 commented 2 years ago

Hello, i have encountered this little problem parsing uk submissions inst = parse_ixbrl(file_path, cache) File "/Users/lafiraed/Documents/finance-pipelines/compagniesHouse/uk_company/xbrl/instance.py", line 407, in parse_ixbrl context_dir = _parse_context_elements(xbrl_resources.findall('xbrli:context', NAME_SPACES), ns_map, taxonomy, cache) File "/Users/lafiraed/Documents/finance-pipelines/compagniesHouse/uk_company/xbrl/instance.py", line 549, in _parse_context_elements dimension_tax = taxonomy.get_taxonomy(ns_map[dimension_prefix]) KeyError: 'bus' here is the submission file https://drive.google.com/file/d/1Mncf4rW9Dl8nghIjbP28nkcZEiqxQzBV/view?usp=sharing

manusimidt commented 2 years ago

I get the following error when parsing the submission provided by you.

  File "E:\Programming\python\xbrl_parser\xbrl\taxonomy.py", line 215, in parse_taxonomy_url
    return parse_taxonomy(schema_path, cache, schema_url)
  File "E:\Programming\python\xbrl_parser\xbrl\taxonomy.py", line 343, in parse_taxonomy
    concept: Concept = c_taxonomy.concepts[concept_id]
KeyError: 'curr_S%C3%A3oTom%C3%A9Pr%C3%ADncipeDobra'

Seems like the parser uses somewhere the wrong file encoding. curr_S%C3%A3oTom%C3%A9Pr%C3%ADncipeDobra should be curr_SãoToméPríncipeDobra

shadow111 commented 2 years ago

do you have any idea on how to fix it please?

manusimidt commented 2 years ago

I looked at the label linkbase again and found that this has nothing to do with the file encoding after all. The href attribute of the linkbase is simply url-encoded here.

<loc 
  xlink:href="currencies-2014-09-01.xsd#curr_S%C3%A3oTom%C3%A9Pr%C3%ADncipeDobra" 
  xlink:label="curr_SãoToméPríncipeDobra" 
  xlink:type="locator"/>

<labelArc 
  order="1.0" 
  xlink:arcrole="http://www.xbrl.org/2003/arcrole/concept-label" 
  xlink:from="curr_SãoToméPríncipeDobra" 
  xlink:to="curr_SãoToméPríncipeDobra_lbl" 
  xlink:type="arc"/>

<label 
  xlink:label="curr_SãoToméPríncipeDobra_lbl" 
  xlink:role="http://www.xbrl.org/2003/role/label" 
  xlink:type="resource" 
  xml:lang="en">São Tomé and Príncipe Dobra</label>

https://xbrl.frc.org.uk/cd/2014-09-01/currencies/currencies-2014-09-01-label.xml

manusimidt commented 2 years ago

Now I am getting the same error you described earlier... 😄 Seems like this was a bug that was introduced in the latest versions of the package.

I will now check why the submission you provided fails with the error message KeyError: 'bus'.

manusimidt commented 2 years ago

Found the issue. There is always this one report which is structured slightly different... The creator of this report defines the namespace "bus" for every xml-element instead of just defining it once at the top of the submission. The parser currently only uses the namespace-prefix map defined by the xmlns attributes at the root element. This is why it can'T find the corresponding namespace for the prefix 'bus'. I will have a look at the error in the next few days and hopefully have enough time to fix it before the weekend.

image