manusimidt / py-xbrl

Python-based parser for parsing XBRL and iXBRL files
https://py-xbrl.readthedocs.io/en/latest/
GNU General Public License v3.0
116 stars 46 forks source link

The taxonomy with namespace http://fasb.org/us-gaap/2020-01-31 could not be found #28

Closed Pablompg closed 3 years ago

Pablompg commented 3 years ago

Bug description

The taxonomy with namespace http://fasb.org/us-gaap/2020-01-31 could not be found. Please check if it is imported in the schema file.

Steps to reproduce the behavior

from xbrl.cache import HttpCache
from xbrl.instance import XbrlParser
cache = HttpCache('./cache')
cache.set_headers({'From': 'your.name@company.com', 'User-Agent': 'Tool/Version (Website)'})
xbrlParser = XbrlParser(cache)
url = 'https://www.sec.gov/Archives/edgar/data/1822027/000121390021030040/tekk-20201231.xml'
xbrlParser.parse_instance(url)

Error Trace

Traceback (most recent call last):
  File "/home/pablo/.pyenv/versions/3.7.10/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/pablo/.pyenv/versions/3.7.10/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/pablo/Desktop/repos/data/etls/providers/raw_sec1_extract_fundamentals/src/raw_sec1_extract_fundamentals/__main__.py", line 23, in <module>
    download_files(config, os.getenv("DOWNLOAD_FILES", "download.files"))
  File "/home/pablo/.local/share/virtualenvs/raw_sec1_extract_fundamentals-70AQCo8K/lib/python3.7/site-packages/data_components/config/logging.py", line 229, in wrapped_method
    return method(*args, **kwargs)
  File "/home/pablo/.local/share/virtualenvs/raw_sec1_extract_fundamentals-70AQCo8K/lib/python3.7/site-packages/data_components/config/logging.py", line 278, in wrapped_method
    ans = method(*args, **kwargs)
  File "/home/pablo/Desktop/repos/data/etls/providers/raw_sec1_extract_fundamentals/src/raw_sec1_extract_fundamentals/__main__.py", line 15, in download_files
    SecDownload(config, config_key).run()
  File "/home/pablo/Desktop/repos/data/etls/providers/raw_sec1_extract_fundamentals/src/raw_sec1_extract_fundamentals/sec1_task.py", line 50, in run
    xbrlParser.parse_instance(url)
  File "/home/pablo/.local/share/virtualenvs/raw_sec1_extract_fundamentals-70AQCo8K/lib/python3.7/site-packages/xbrl/instance.py", line 589, in parse_instance
    return parse_xbrl_url(url, self.cache)
  File "/home/pablo/.local/share/virtualenvs/raw_sec1_extract_fundamentals-70AQCo8K/lib/python3.7/site-packages/xbrl/instance.py", line 256, in parse_xbrl_url
    return parse_xbrl(instance_path, cache, instance_url)
  File "/home/pablo/.local/share/virtualenvs/raw_sec1_extract_fundamentals-70AQCo8K/lib/python3.7/site-packages/xbrl/instance.py", line 288, in parse_xbrl
    context_dir = _parse_context_elements(root.findall('xbrli:context', NAME_SPACES), root.attrib['ns_map'], taxonomy)
  File "/home/pablo/.local/share/virtualenvs/raw_sec1_extract_fundamentals-70AQCo8K/lib/python3.7/site-packages/xbrl/instance.py", line 532, in _parse_context_elements
    if dimension_tax is None: raise TaxonomyNotFound(ns_map[dimension_prefix])
xbrl.TaxonomyNotFound: The taxonomy with namespace http://fasb.org/us-gaap/2020-01-31 could not be found. Please check if it is imported in the schema file
Pablompg commented 3 years ago

I got the same error for the instance: https://www.sec.gov/Archives/edgar/data/1040130/000143774921013213/pets20210331_10k.htm

Error Message

The taxonomy with namespace http://xbrl.sec.gov/dei/2019-01-31 could not be found. Please check if it is imported in the schema file

Pablompg commented 3 years ago

Any idea on how to solve this issue in a general way? The problem is that the .xsd document does not import all taxonomies used. I will have a deeper look into the code and see if I can think of an elegant solution to this problem.

manusimidt commented 3 years ago

To deal with this issue I created a function that maps common namespaces (such as the dei or us-gaap taxonomy) to the corresponding namespace. https://github.com/manusimidt/xbrl_parser/blob/75e1bf0c973d4c875b2598b4e2e462c8a99f1162/xbrl/taxonomy.py#L149-L191

However this apparently does not work in your case. I will check why it's not working and implement a fix. I'm pretty busy until tonight, so I probably won't be able to give more detailed feedback until tomorrow.

I don't really know how to solve this in a general way. The taxonomy schema file can be stored on any server on the internet. So it is impossible to locate the taxonomy schema if you only have the namespace.

For all popular taxonomies the solution with a mapping from namespace to schema url is probably the only solution 🤔.

manusimidt commented 3 years ago

Seems to work now with both submissions you provided. The issue was that these submission do not even mention the us-gaap and dei taxonomies in their extension taxonomy. Due to the fact that both the us-gaap and the dei taxonomy are standard taxonomies we can easily parse and load these taxonomies even after the extension taxonomy was already parsed.

I will create a new minor version of the package that contains this change.

Pablompg commented 3 years ago

Awesome work! Thank you very much. I really appreciate the effort :+1: