manusimidt / py-xbrl

Python-based parser for parsing XBRL and iXBRL files
https://py-xbrl.readthedocs.io/en/latest/
GNU General Public License v3.0
111 stars 40 forks source link

Fix for Schema with recuvsive imports #133

Closed Sam-el0 closed 4 months ago

Sam-el0 commented 4 months ago

Hi @manusimidt,

This should be a low overhead fix for when schemas have recuvsive imports using a set to track imported URIs in the recursive parse_taxonomy functions.

Many thanks, Sam

manusimidt commented 4 months ago

Encountered some issues with the changes.

from xbrl.cache import HttpCache
from xbrl.instance import XbrlParser, XbrlInstance

cache: HttpCache = HttpCache('./cache')
xbrlParser = XbrlParser(cache)

url = "https://www.sec.gov/Archives/edgar/data/2488/000000248822000170/amd-20220924.htm"
inst: XbrlInstance = xbrlParser.parse_instance(url, cache)

results for me in:

Traceback (most recent call last):
  File "C:\Users\manus\Code\python\py-xbrl\workdir\test_pull_109.py", line 8, in <module>
    inst: XbrlInstance = xbrlParser.parse_instance(url, cache)
  File "C:\Users\manus\Code\python\py-xbrl\xbrl\instance.py", line 740, in parse_instance
    return parse_ixbrl_url(uri, self.cache) if is_url(uri) else parse_ixbrl(uri, self.cache, instance_url, encoding)
  File "C:\Users\manus\Code\python\py-xbrl\xbrl\instance.py", line 426, in parse_ixbrl_url
    return parse_ixbrl(instance_path, cache, instance_url, encoding)
  File "C:\Users\manus\Code\python\py-xbrl\xbrl\instance.py", line 474, in parse_ixbrl
    taxonomy: TaxonomySchema = parse_taxonomy_url(schema_url, cache, imported_schema_uris)
TypeError: unhashable type: 'set'

Process finished with exit code 1

I will try to find a fix

Sam-el0 commented 4 months ago

Ah good catch, just pushed a fix. I left in some differences between the ixbrl and xbrl functions. Key being is you cannot pass the set to parse_taxonomy_url. Works with your example now.

manusimidt commented 4 months ago

Looks good now. All my tests are now executed properly. Thanks a lot for the PR!

manusimidt commented 4 months ago

I will release a new version of py-xbrl (containing #133) next week

pnatusch commented 4 months ago

Manual I'd like to thank you for your excellent library. I'm still a newbie. I'm attempting to produce 10 years of data for ibm from the xbrl data. The issue that I'm running into is that the concept id for the label, "Net change in cash and cash equivalents" has changed between the 2014-2-24 filing and the 2015-2-24 filing. The label comes from ibm-20131231_lab.xml and ibm-20141231_lab.xml.

QUESTION: Are changes like this documented in xbrl? Where would I find that info? concept_id for filing_date 2014-2-24: us-gaap_NetCashProvidedByUsedInContinuingOperations concept_id for filing_date 2015-2-24: us-gaap_CashAndCashEquivalentsPeriodIncreaseDecrease

The values for this table have been extracted from the instance files ibm-20131231.xml (filed 2/24/2014) and ibm-20141231_lab.xml (filed 2/24/2015). Details below.

                          value for       value for
                     filing          filing
                     date            date
                     2014-2-24       2015-2-24      comment
    report_date
    2014-12-31      N/A            -2240000000     as expected 2014 is

not in the 2014-2-24 filing 2013-12-31 304000000 304000000 SAME 2012-12-31 -1511000000 -1511000000 SAME 2011-12-31 1262000000 N/A as expected 2011 is not in the 2015-2-24 filing

Details

########################

ibm-20131231_lab.xml

########################

label_linkbase = parse_linkbase_url( 'https://www.sec.gov/Archives/edgar/data/51143/000104746914001302/ ibm-20131231_lab.xml', LinkbaseType.LABEL, cache)

pprint.pp(label_linkbase.to_dict()['standardExtendedLinkElements'][0]['root_locators'][1225]) {'name': 'element1226', 'href': 'http://xbrl.fasb.org/us-gaap/2013/elts/us-gaap-2013-01-31.xsd# us-gaap_NetCashProvidedByUsedInContinuingOperations', 'concept_id': 'us-gaap_NetCashProvidedByUsedInContinuingOperations', 'children': [{'http://www.xbrl.org/2003/role/totalLabel': '

Net change in cash ' 'and cash ' 'equivalents'}]}

########################

ibm-20131231.xml

########################

print(f"inst: XbrlInstance =" f"parse_xbrl_url(\"{xbrl_data._xbrl_urls['inst_url']}\", cache)") for fact in inst.facts: if fact.concept.name in ["DocumentPeriodEndDate", "NetCashProvidedByUsedInContinuingOperations",

"CashAndCashEquivalentsPeriodIncreaseDecrease"]: print(f"\tfact.concept.name:{fact.concept.name}") print(f"\tfact.context.start_date:{fact.context.start_date}")

    print(f"\tfact.context.end_date:{fact.context.end_date}")
    print(f"\tfact.value:{fact.value}")

inst: XbrlInstance =parse_xbrl_url(" https://www.sec.gov/Archives/edgar/data/51143/000104746914001302/i bm-20131231.xml", cache) fact.concept.name:DocumentPeriodEndDate fact.context.start_date:2013-01-01 fact.context.end_date:2013-12-31 fact.value:2013-12-31 fact.concept.name:NetCashProvidedByUsedInContinuingOperations fact.context.start_date:2013-01-01 fact.context.end_date:2013-12-31 fact.value:304000000.0 fact.concept.name:NetCashProvidedByUsedInContinuingOperations fact.context.start_date:2012-01-01 fact.context.end_date:2012-12-31 fact.value:-1511000000.0 fact.concept.name:NetCashProvidedByUsedInContinuingOperations fact.context.start_date:2011-01-01 fact.context.end_date:2011-12-31 fact.value:1262000000.0

########################

ibm-20141231_lab.xml

######################## label_linkbase = parse_linkbase_url( 'https://www.sec.gov/Archives/edgar/data/51143/000104746915001106/ ibm-20141231_lab.xml', LinkbaseType.LABEL, cache)

(Pdb) pprint.pp(label_linkbase.to_dict()['standardExtendedLinkElements'][0]['root_locators'][213]) {'name': 'element214', 'href': 'http://xbrl.fasb.org/us-gaap/2014/elts/us-gaap-2014-01-31.xsd# us-gaap_CashAndCashEquivalentsPeriodIncreaseDecrease', 'concept_id': 'us-gaap_CashAndCashEquivalentsPeriodIncreaseDecrease', 'children': [{'http://www.xbrl.org/2003/role/totalLabel': '

Net change in cash ' 'and cash ' 'equivalents'}]}

########################

ibm-20141231.xml

########################

print(f"inst: XbrlInstance =" f"parse_xbrl_url(\"{xbrl_data._xbrl_urls['inst_url']}\", cache)") for fact in inst.facts: if fact.concept.name in ["DocumentPeriodEndDate", "NetCashProvidedByUsedInContinuingOperations",

"CashAndCashEquivalentsPeriodIncreaseDecrease"]: print(f"\tfact.concept.name:{fact.concept.name}") print(f"\tfact.context.start_date:{fact.context.start_date}")

    print(f"\tfact.context.end_date:{fact.context.end_date}")
    print(f"\tfact.value:{fact.value}")

inst: XbrlInstance =parse_xbrl_url(" https://www.sec.gov/Archives/edgar/data/51143/000104746915001106/ ibm-20141231.xml", cache) fact.concept.name:DocumentPeriodEndDate fact.context.start_date:2014-01-01 fact.context.end_date:2014-12-31 fact.value:2014-12-31 fact.concept.name:CashAndCashEquivalentsPeriodIncreaseDecrease fact.context.start_date:2014-01-01 fact.context.end_date:2014-12-31 fact.value:-2240000000.0 fact.concept.name:CashAndCashEquivalentsPeriodIncreaseDecrease fact.context.start_date:2013-01-01 fact.context.end_date:2013-12-31 fact.value:304000000.0 fact.concept.name:CashAndCashEquivalentsPeriodIncreaseDecrease fact.context.start_date:2012-01-01 fact.context.end_date:2012-12-31 fact.value:-1511000000.0

Message ID: @.***>