manusimidt / py-xbrl

Python-based parser for parsing XBRL and iXBRL files
https://py-xbrl.readthedocs.io/en/latest/
GNU General Public License v3.0
116 stars 46 forks source link

Parsing of Presentation Linkbase for SEC submissions #21

Closed manusimidt closed 3 years ago

manusimidt commented 3 years ago

I am also having problems getting information from the the presentation linkbase. In my case I am getting the information from: microsoft 10k-2020 instance document and the object instance.taxonomy.pre_linkbases does not contain the same information as the linkbase document. It is missing all the locators and definitionArcs. I have spent a few hours looking into the code but I can't find where the error is.

_Originally posted by @Pablompg in https://github.com/manusimidt/xbrl_parser/issues/20#issuecomment-848820384_

manusimidt commented 3 years ago

Do you mean the 2020 10-K from AAPL or the 10-K from Microsoft? (you said microsoft but the url pointed to one from apple).

I could not find differences between the actual presentation linkbase and the structure of the parsed linkbase. However i only compared small parts of the linkbase, trying to figure out the structure in the actual linkbase file can be really tedious (as you probably also noticed 😄).

Here is an simple example how you could print the structure of the presentation linkbase:

from xbrl_parser.linkbase import PresentationArc
from xbrl_parser.instance import parse_xbrl_url
from xbrl_parser.cache import HttpCache
import logging

logging.basicConfig(level=logging.INFO)
cache: HttpCache = HttpCache('./../cache/')
cache.set_headers({'From': 'hello@schmidt-manuel.de', 'User-Agent': 'py-xbrl/1.1.4'})

instance_path = 'https://www.sec.gov/Archives/edgar/data/320193/000032019320000096/aapl-20200926_htm.xml'
inst = parse_xbrl_url(instance_path, cache)

def print_presentation_arc(level: int, arc: PresentationArc):
    print(f"{'  ' * level} {arc.to_locator.concept_id}")
    for child_arc in arc.to_locator.children:
        print_presentation_arc(level + 1, child_arc)

for pre_linkbase in inst.taxonomy.pre_linkbases:
    for elr in pre_linkbase.extended_links:
        print(f"======== {elr.elr_id} ========")
        # if the elr is empty, skip it
        if len(elr.root_locators) == 0: continue
        # presentation linkbase has only one top level locator (in most cases)
        for pre_arc in elr.root_locators[0].children:
            print_presentation_arc(0, pre_arc)

This will print out all presentation arcs and their locators they are referencing. For example the representation of the balance sheet from the presentation linkbase would look like the following:

======== aapl-20200926.xsd#CONSOLIDATEDBALANCESHEETS ========
 us-gaap_AssetsAbstract
   us-gaap_AssetsCurrentAbstract
     us-gaap_CashAndCashEquivalentsAtCarryingValue
     us-gaap_MarketableSecuritiesCurrent
     us-gaap_AccountsReceivableNetCurrent
     us-gaap_InventoryNet
     us-gaap_NontradeReceivablesCurrent
     us-gaap_OtherAssetsCurrent
     us-gaap_AssetsCurrent
   us-gaap_AssetsNoncurrentAbstract
     us-gaap_MarketableSecuritiesNoncurrent
     us-gaap_PropertyPlantAndEquipmentNet
     us-gaap_OtherAssetsNoncurrent
     us-gaap_AssetsNoncurrent
   us-gaap_Assets
 us-gaap_LiabilitiesAndStockholdersEquityAbstract
   us-gaap_LiabilitiesCurrentAbstract
     us-gaap_AccountsPayableCurrent
     us-gaap_OtherLiabilitiesCurrent
     us-gaap_ContractWithCustomerLiabilityCurrent
     us-gaap_CommercialPaper
     us-gaap_LongTermDebtCurrent
     us-gaap_LiabilitiesCurrent
   us-gaap_LiabilitiesNoncurrentAbstract
     us-gaap_LongTermDebtNoncurrent
     us-gaap_OtherLiabilitiesNoncurrent
     us-gaap_LiabilitiesNoncurrent
   us-gaap_Liabilities
   us-gaap_CommitmentsAndContingencies
   us-gaap_StockholdersEquityAbstract
     us-gaap_CommonStocksIncludingAdditionalPaidInCapital
     us-gaap_RetainedEarningsAccumulatedDeficit
     us-gaap_AccumulatedOtherComprehensiveIncomeLossNetOfTax
     us-gaap_StockholdersEquity
   us-gaap_LiabilitiesAndStockholdersEquity

Notice that all concept ids ending with "Abstract" are not present in the instance document. They are just used for structuring.

I know that this part of the xbrl parser is not well developed as i always primarily focused on getting the facts.
In the next few weeks I will think about whether this can be better represented in the object instances and then document this part of the parser better.

Pablompg commented 3 years ago

Thank you Manu. I was not obtaining the data because I was trying to get it in a wrong way. This solved the issue and provided a good coding example. I think we can mark the issue as solved.