manusimidt / py-xbrl

Python-based parser for parsing XBRL and iXBRL files
https://py-xbrl.readthedocs.io/en/latest/
GNU General Public License v3.0
111 stars 40 forks source link

prefix 'ix' not found in prefix map #11

Closed Samar2170 closed 2 years ago

Samar2170 commented 3 years ago

This is the full console log

`Traceback (most recent call last):

File "/home/samar/anaconda3/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3418, in run_code exec(code_obj, self.user_global_ns, self.user_ns)

File "", line 5, in XbrlInstance = parse_ixbrl(ixbrl_path,cache)

File "/home/samar/anaconda3/lib/python3.8/site-packages/xbrl_parser/instance.py", line 389, in parse_ixbrl xbrl_resources: ET.Element = root.find('.//ix:resources', ns_map)

File "/home/samar/anaconda3/lib/python3.8/xml/etree/ElementTree.py", line 649, in find return self._root.find(path, namespaces)

File "/home/samar/anaconda3/lib/python3.8/xml/etree/ElementPath.py", line 389, in find return next(iterfind(elem, path, namespaces), None)

File "/home/samar/anaconda3/lib/python3.8/xml/etree/ElementPath.py", line 368, in iterfind selector.append(ops[token[0]](next, token))

File "/home/samar/anaconda3/lib/python3.8/xml/etree/ElementPath.py", line 184, in prepare_descendant token = next()

File "/home/samar/anaconda3/lib/python3.8/xml/etree/ElementPath.py", line 86, in xpath_tokenizer raise SyntaxError("prefix %r not found in prefix map" % prefix) from None

File "", line unknown SyntaxError: prefix 'ix' not found in prefix map `

The taxonomy was imported sucessfully.

manusimidt commented 3 years ago

Seems like the instance document you are trying to parse is using the ix prefix without defining the namespace. Normally the submission should declare all prefixes it is using with xmlns.

xmlns:ix="http://www.xbrl.org/2013/inlineXBRL"

But since ix is the default namespace for the Inline XBRL Specification 1.1, it can be argued that the parser should assume that ix always belongs to the namespace http://www.xbrl.org/2013/inlineXBRL, even if this is not defined by the creator of the instance document.

Which datasource are you using for the instance documents? I currently work primarily with SEC Edgar where I have not yet encountered this problem.

tedjansen commented 3 years ago

Hello,

I'm experiencing the same issue. The XBRL is from the DTA (Dutch Tax Authority). The XSD schema files are retrieved correctly. Attached is the XBRL but without any data. It seems to be the namespace that the DTA uses, bd-i. If I patch the module file 'instance.py' with the bd-i and iso2417 namespace, it unfortunately doesn't work.

btw-gh.txt

manusimidt commented 3 years ago

@tedjansen The file you provided correctly defines both namespaces bd-i and iso2417 correctly. However the parser can not find the url of the xml schema that defines the concepts (i.e: where the concept bd-i:ValueAddedTaxPrivateUse is defined). I also could only find it via google, the correct schema url is: http://www.nltaxonomie.nl/nt13/bd/20181212/dictionary/bd-data.xsd

XBRL submissions from the SEC always contain a mapping between namespace and schema url in the taxonomy schema. image

Honestly I am not quite sure how to get the schema url for the namespace xmlns:bd-i="http://www.nltaxonomie.nl/nt13/bd/20181212/dictionary/bd-data" in your case because I can't find the mapping from namespace to schema url anywhere in the taxonomy that your file uses 🤔.

manusimidt commented 3 years ago

In this particular case the parser would just have to add a .xsd to the namespace to get to the schema url. However, this is not the normal way to get from a namespace to an xml schema. Or am I missing something in the XML specification?

tedjansen commented 3 years ago

That seems unfortunately to be the way. I've parsed 4 different types of XBRL files locally (DCIT (Dutch Corporate Income Tax/VPB), DIT (Dutch Income Tax/IB), VAT (OB) and filed accounts (KVK deponering, Dutch GAAP,) but they all give an error when downloading any dictionary from nltaxonomie.nl.

Error from NL Taxonomy download

Would love to send in a PR, but I can't follow the exact error to where I should fix it. Besides the point that it seems a bit weird that there should be a reference in the software to supplement .xsd to the NL Taxonomy files. See below an example of a filed account statement with the same reference, but not the bd-i namespace.

image
manusimidt commented 3 years ago

I found a mapping between namespace and schema url in the releasenotes of the sbr taxonomie (page 3): image

I am not quite sure if it is the best idea to add a fallback where the parser simply tries to add a .xsd to the namespace, make an http request and check if it was successful.

In my opinion it would be nicer to add the ability to manually pass a dictionary to the parser, where you can define a mapping between namespace and schema url by yourself. This information can normally be found in the taxonomy architecture description.

I will think about how this could be implemented in the next few days.

manusimidt commented 3 years ago

@tedjansen I was able to parse the file you provided above by manually adding the missing mapping to the ns_schema_map attribute of the taxonomy module. Please install the latest version of py-xbrl (py-xbrl 2.0.3) and check if it works for you.

import logging
from xbrl.cache import HttpCache
from xbrl.instance import XbrlInstance, XbrlParser
from xbrl.taxonomy import ns_schema_map

cache: HttpCache = HttpCache('./cache')
xbrlParser = XbrlParser(cache)

# add the missing namespace/schema_url mapping
ns_schema_map["http://www.nltaxonomie.nl/nt13/bd/20181212/dictionary/bd-data"] = \
    "http://www.nltaxonomie.nl/nt13/bd/20181212/dictionary/bd-data.xsd"

# Todo: replace the file path
xbrl_url = './data/DTA/btw-gh.xml'
inst: XbrlInstance = xbrlParser.parse_instance_locally(xbrl_url)

I know that this is not the prettiest solution but it should work for now.

tedjansen commented 3 years ago

Thx, that works if the file ends on .xml. The XBRL ends on .xbrl by default in The Netherlands, so the class automatically switches to the ixbrl processing variant. Unfortunately I get the same error, but by renaming it works.