manusimidt / py-xbrl

Python-based parser for parsing XBRL and iXBRL files
https://py-xbrl.readthedocs.io/en/latest/
GNU General Public License v3.0
100 stars 37 forks source link

Is there a way to retrieve a fact's ID? #27

Open jamiehannaford opened 3 years ago

jamiehannaford commented 3 years ago

If you look at this tag:

<us-gaap:Assets contextRef="FI2015Q4" decimals="-3" id="Fact-7214827CB0865D3EDB8BC10FF27FAF5E" unitRef="usd">377284000</us-gaap:Assets>

I would like to access the id attribute in order to link together elements with footnoteArc. I don't see anything exposed in AbstractFact or NumericFact, but maybe I'm missing something.

manusimidt commented 3 years ago

No you are not missing something.

I intentionally left the fact IDs out back then, since they are created by the filer and only serve for linking in the XBRL files. My goal at the time was to develop a parser that could convert the complex structure of XBRL into simple object structures and discard all unnecessary information to reduce complexity.

Theoretically if you store all the elements of the XBRL filing into a object structure, you would not need the id attribute. But as you already noticed, the parser currently does not parse the footnotes and just ignores them.

Could you share the link/adsh of some samle submissions you are working with? I will think about how to read the foodnotes in the course of the week. It would probably be most practical to simply add another attribute to the AbstractFact class, which can store a foodnote.

jamiehannaford commented 3 years ago

The 10K for PFE 2020 uses IDs to map a fact to a footnote:

In pfe-20201231_htm.xml:

<pfe:OtherNonoperatingIncomeExpenseNet
      contextRef="i40d8184605444a15897ab5244367c12b_D20200101-20201231"
      decimals="-6"
      id="id3VybDovL2RvY3MudjEvZG9jOmIyNDI2ZGVmNDYxMzQxNjRhOTZlY2UwMzI2ZmUxNzliL3NlYzpiMjQyNmRlZjQ2MTM0MTY0YTk2ZWNlMDMyNmZlMTc5Yl8xMTgvZnJhZzo4MWY1ODQzYjUyYzY0YjVmYmRkMzZlZGJmYWU2NjE1Yy90YWJsZTo2NDI1MzZlNzhhOWQ0NGQ4YTFjMzBhOGU4YWZiNGRhZS90YWJsZXJhbmdlOjY0MjUzNmU3OGE5ZDQ0ZDhhMWMzMGE4ZThhZmI0ZGFlXzE3LTItMS0xLTA_fea5b324-5464-4272-84ff-e4dea5a90241"
      unitRef="usd">493000000</pfe:OtherNonoperatingIncomeExpenseNet>
<link:footnoteArc
          xlink:arcrole="http://www.xbrl.org/2003/arcrole/fact-footnote"
          xlink:from="id3VybDovL2RvY3MudjEvZG9jOmIyNDI2ZGVmNDYxMzQxNjRhOTZlY2UwMzI2ZmUxNzliL3NlYzpiMjQyNmRlZjQ2MTM0MTY0YTk2ZWNlMDMyNmZlMTc5Yl8xMTgvZnJhZzo4MWY1ODQzYjUyYzY0YjVmYmRkMzZlZGJmYWU2NjE1Yy90YWJsZTo2NDI1MzZlNzhhOWQ0NGQ4YTFjMzBhOGU4YWZiNGRhZS90YWJsZXJhbmdlOjY0MjUzNmU3OGE5ZDQ0ZDhhMWMzMGE4ZThhZmI0ZGFlXzE3LTItMS0xLTA_fea5b324-5464-4272-84ff-e4dea5a90241"
          xlink:to="id3VybDovL2RvY3MudjEvZG9jOmIyNDI2ZGVmNDYxMzQxNjRhOTZlY2UwMzI2ZmUxNzliL3NlYzpiMjQyNmRlZjQ2MTM0MTY0YTk2ZWNlMDMyNmZlMTc5Yl8xMTgvZnJhZzo4MWY1ODQzYjUyYzY0YjVmYmRkMzZlZGJmYWU2NjE1Yy90ZXh0cmVnaW9uOjgxZjU4NDNiNTJjNjRiNWZiZGQzNmVkYmZhZTY2MTVjXzIyNTM5OTg4NDE2NDM0_e333fc41-2314-4474-b29f-ff3181d00b66"
          xlink:type="arc"/>
<link:footnote id="id3VybDovL2RvY3MudjEvZG9jOmIyNDI2ZGVmNDYxMzQxNjRhOTZlY2UwMzI2ZmUxNzliL3NlYzpiMjQyNmRlZjQ2MTM0MTY0YTk2ZWNlMDMyNmZlMTc5Yl8xMTgvZnJhZzo4MWY1ODQzYjUyYzY0YjVmYmRkMzZlZGJmYWU2NjE1Yy90ZXh0cmVnaW9uOjgxZjU4NDNiNTJjNjRiNWZiZGQzNmVkYmZhZTY2MTVjXzIyNTM5OTg4NDE2NDM0_e333fc41-2314-4474-b29f-ff3181d00b66" xlink:label="id3VybDovL2RvY3MudjEvZG9jOmIyNDI2ZGVmNDYxMzQxNjRhOTZlY2UwMzI2ZmUxNzliL3NlYzpiMjQyNmRlZjQ2MTM0MTY0YTk2ZWNlMDMyNmZlMTc5Yl8xMTgvZnJhZzo4MWY1ODQzYjUyYzY0YjVmYmRkMzZlZGJmYWU2NjE1Yy90ZXh0cmVnaW9uOjgxZjU4NDNiNTJjNjRiNWZiZGQzNmVkYmZhZTY2MTVjXzIyNTM5OTg4NDE2NDM0_e333fc41-2314-4474-b29f-ff3181d00b66" xlink:role="http://www.xbrl.org/2003/role/footnote" xlink:type="resource" xml:lang="en-US"><xhtml:span style="color:#000000;font-family:'Arial',sans-serif;font-size:7pt;font-weight:400;line-height:120%;padding-left:4.39pt">2020 includes, among other things, (i) dividend income of $278 million from our investment in ViiV and (ii) charges of $105 million, reflecting the change in the fair value of contingent consideration. 2019 included, among other things, (i) dividend income of $220 million from our investment in ViiV; (ii) charges of $152 million for external incremental costs, such as transaction costs and costs to separate our Consumer Healthcare business into a separate legal entity, associated with the formation of the Consumer Healthcare JV; and (iii) net losses on early retirement of debt of $138&#160;million. 2018 included, among other things, (i) a non-cash $343 million pre-tax gain associated with our transaction with Bain Capital to create a new biopharmaceutical company, Cerevel, to continue development of a portfolio of clinical and preclinical stage neuroscience assets primarily targeting disorders of the central nervous system; (ii) dividend income of $253 million from our investment in ViiV; (iii) a non-cash $50 million pre-tax gain related to our contribution agreement entered into with Allogene (see </xhtml:span><xhtml:span style="color:#000000;font-family:'Arial',sans-serif;font-size:7pt;font-style:italic;font-weight:400;line-height:120%">Note 2B</xhtml:span><xhtml:span style="color:#000000;font-family:'Arial',sans-serif;font-size:7pt;font-weight:400;line-height:120%">); (iv) charges of $207&#160;million, reflecting the change in the fair value of contingent consideration, and (vi) charges of $112 million for external incremental costs, such as transaction costs and costs to separate our Consumer Healthcare business into a separate legal entity, associated with the formation of the Consumer Healthcare JV.</xhtml:span></link:footnote>

I want to inspect the footnote for any nested facts. In the above example, "Other, net - $493" contains 2 more facts, so it's important to discount these to avoid double counting.

jamiehannaford commented 3 years ago

If you could somehow link a fact to its footnotes that would be amazing. Another great feature btw is somehow linking a concept to all of its presentation/calculation locators. Very often I have to traverse the full linkbase in order to map a concept ID to a label, so having them available on the concept object itself would be 💯

This is quite important because very often companies get the balance wrong (e.g. credit or debit), or they use a negative value for something that should really be positive. So being able to inspect the label for words like (gain) or (loss) is required to correct things.

manusimidt commented 3 years ago

I'll have a look at parsing the foodnotes and give you feedback by friday evening.

Correctly adding the correct resources or hierachies to each concept can be really challenging because you could have overriding relationships. For example you could have a taxonomy (tax1) that defines the following relationship concept1 = concept_a + concept_b and another taxonomy (tax2) that extends from the taxonomy tax1 and overrides the relationship concept1 = concept_a + concept_c + concept_d.

But honestly, I'm not sure how often this is actually done in practice and i really agree that it would be helpful to have information about the concept object instance directly and not have to go through the whole object structure of the linkbase. I'll think about how to implement this more easily in the current object structure.

manusimidt commented 3 years ago

I am currently working on a solution to parse the footnote links. It would be great if the elements would be parsed with the same code I use in the linkbase module for parsing extended XLinks (because they are basically just extended XLinks). However, I still need to change and split off some code in the linkbase module so that I can reuse it for the footnote links in the instance document.