Closed manusimidt closed 3 years ago
Do you mean the 2020 10-K from AAPL or the 10-K from Microsoft? (you said microsoft but the url pointed to one from apple).
I could not find differences between the actual presentation linkbase and the structure of the parsed linkbase. However i only compared small parts of the linkbase, trying to figure out the structure in the actual linkbase file can be really tedious (as you probably also noticed 😄).
Here is an simple example how you could print the structure of the presentation linkbase:
from xbrl_parser.linkbase import PresentationArc
from xbrl_parser.instance import parse_xbrl_url
from xbrl_parser.cache import HttpCache
import logging
logging.basicConfig(level=logging.INFO)
cache: HttpCache = HttpCache('./../cache/')
cache.set_headers({'From': 'hello@schmidt-manuel.de', 'User-Agent': 'py-xbrl/1.1.4'})
instance_path = 'https://www.sec.gov/Archives/edgar/data/320193/000032019320000096/aapl-20200926_htm.xml'
inst = parse_xbrl_url(instance_path, cache)
def print_presentation_arc(level: int, arc: PresentationArc):
print(f"{' ' * level} {arc.to_locator.concept_id}")
for child_arc in arc.to_locator.children:
print_presentation_arc(level + 1, child_arc)
for pre_linkbase in inst.taxonomy.pre_linkbases:
for elr in pre_linkbase.extended_links:
print(f"======== {elr.elr_id} ========")
# if the elr is empty, skip it
if len(elr.root_locators) == 0: continue
# presentation linkbase has only one top level locator (in most cases)
for pre_arc in elr.root_locators[0].children:
print_presentation_arc(0, pre_arc)
This will print out all presentation arcs and their locators they are referencing. For example the representation of the balance sheet from the presentation linkbase would look like the following:
======== aapl-20200926.xsd#CONSOLIDATEDBALANCESHEETS ========
us-gaap_AssetsAbstract
us-gaap_AssetsCurrentAbstract
us-gaap_CashAndCashEquivalentsAtCarryingValue
us-gaap_MarketableSecuritiesCurrent
us-gaap_AccountsReceivableNetCurrent
us-gaap_InventoryNet
us-gaap_NontradeReceivablesCurrent
us-gaap_OtherAssetsCurrent
us-gaap_AssetsCurrent
us-gaap_AssetsNoncurrentAbstract
us-gaap_MarketableSecuritiesNoncurrent
us-gaap_PropertyPlantAndEquipmentNet
us-gaap_OtherAssetsNoncurrent
us-gaap_AssetsNoncurrent
us-gaap_Assets
us-gaap_LiabilitiesAndStockholdersEquityAbstract
us-gaap_LiabilitiesCurrentAbstract
us-gaap_AccountsPayableCurrent
us-gaap_OtherLiabilitiesCurrent
us-gaap_ContractWithCustomerLiabilityCurrent
us-gaap_CommercialPaper
us-gaap_LongTermDebtCurrent
us-gaap_LiabilitiesCurrent
us-gaap_LiabilitiesNoncurrentAbstract
us-gaap_LongTermDebtNoncurrent
us-gaap_OtherLiabilitiesNoncurrent
us-gaap_LiabilitiesNoncurrent
us-gaap_Liabilities
us-gaap_CommitmentsAndContingencies
us-gaap_StockholdersEquityAbstract
us-gaap_CommonStocksIncludingAdditionalPaidInCapital
us-gaap_RetainedEarningsAccumulatedDeficit
us-gaap_AccumulatedOtherComprehensiveIncomeLossNetOfTax
us-gaap_StockholdersEquity
us-gaap_LiabilitiesAndStockholdersEquity
Notice that all concept ids ending with "Abstract" are not present in the instance document. They are just used for structuring.
I know that this part of the xbrl parser is not well developed as i always primarily focused on getting the facts.
In the next few weeks I will think about whether this can be better represented in the object instances and then document this part of the parser better.
Thank you Manu. I was not obtaining the data because I was trying to get it in a wrong way. This solved the issue and provided a good coding example. I think we can mark the issue as solved.
I am also having problems getting information from the the presentation linkbase. In my case I am getting the information from: microsoft 10k-2020 instance document and the object
instance.taxonomy.pre_linkbases
does not contain the same information as the linkbase document. It is missing all the locators and definitionArcs. I have spent a few hours looking into the code but I can't find where the error is._Originally posted by @Pablompg in https://github.com/manusimidt/xbrl_parser/issues/20#issuecomment-848820384_