manusimidt / py-xbrl

Python-based parser for parsing XBRL and iXBRL files
https://py-xbrl.readthedocs.io/en/latest/
GNU General Public License v3.0
100 stars 37 forks source link

Any way to get textual labels? #66

Open jonkatz6 opened 2 years ago

jonkatz6 commented 2 years ago

Is there any way to retrieve the textual label corresponding to a fact when it is in a table as follows?

image

In the above table, under Net Sales, Products, Services and total net sales each correspond with tag : us-gaap:RevenueFromContractWithCustomerExcludingAssessedTax.

I was able to parse out the member labels to retrieve tags like us-gaap:ServiceMember. image

What I would really like to do is to find the relavent table labels of Products, Services, and Total set sales. As far as I can tell these are only in the html portion of xbrl docs.

Is there already a way to access these labels, if not how might I go about it?

manusimidt commented 2 years ago

Labels are stored in the label linkbase the taxonomy schema imports. py-xbrl automatically connects the labels to the concepts, so you can just access it from the concept object.

An example:

first_label = fact.concept.labels[0]

Another complete example that also extracts the labels of the members:


logging.basicConfig(level=logging.INFO)
cache: HttpCache = HttpCache('./cache')
cache.set_headers({'From': 'your.email@example.com', 'User-Agent': 'py-xbrl/2.0.7'})
xbrlParser = XbrlParser(cache)

inst: XbrlInstance = xbrlParser.parse_instance(
    'https://www.sec.gov/Archives/edgar/data/320193/000032019321000065/aapl-20210626.htm')

for fact in inst.facts:
    if fact.concept.name == 'RevenueFromContractWithCustomerExcludingAssessedTax':
        label: str = fact.concept.labels[0]
        segments: list = fact.context.segments

        pretty_str: str = f"{label}: {fact.value}"
        for segment in segments:
            pretty_str += f" ({segment.dimension}, {segment.member.labels[0]})"

        print(pretty_str)

This will return the following output:

Net sales: 63948000000.0 (ProductOrServiceAxis, Products)
Net sales: 46529000000.0 (ProductOrServiceAxis, Products)
Net sales: 232309000000.0 (ProductOrServiceAxis, Products)
Net sales: 170598000000.0 (ProductOrServiceAxis, Products)
Net sales: 17486000000.0 (ProductOrServiceAxis, Services)
Net sales: 13156000000.0 (ProductOrServiceAxis, Services)
Net sales: 50148000000.0 (ProductOrServiceAxis, Services)
Net sales: 39219000000.0 (ProductOrServiceAxis, Services)
Net sales: 81434000000.0
Net sales: 59685000000.0
Net sales: 282457000000.0
Net sales: 209817000000.0
Net sales: 39570000000.0 (ProductOrServiceAxis, iPhone)
Net sales: 26418000000.0 (ProductOrServiceAxis, iPhone)
Net sales: 153105000000.0 (ProductOrServiceAxis, iPhone)
Net sales: 111337000000.0 (ProductOrServiceAxis, iPhone)
Net sales: 8235000000.0 (ProductOrServiceAxis, Mac)
Net sales: 7079000000.0 (ProductOrServiceAxis, Mac)
Net sales: 26012000000.0 (ProductOrServiceAxis, Mac)
Net sales: 19590000000.0 (ProductOrServiceAxis, Mac)
Net sales: 7368000000.0 (ProductOrServiceAxis, iPad)
Net sales: 6582000000.0 (ProductOrServiceAxis, iPad)
Net sales: 23610000000.0 (ProductOrServiceAxis, iPad)
Net sales: 16927000000.0 (ProductOrServiceAxis, iPad)
Net sales: 8775000000.0 (ProductOrServiceAxis, Wearables, Home and Accessories)
Net sales: 6450000000.0 (ProductOrServiceAxis, Wearables, Home and Accessories)
Net sales: 29582000000.0 (ProductOrServiceAxis, Wearables, Home and Accessories)
Net sales: 22744000000.0 (ProductOrServiceAxis, Wearables, Home and Accessories)
Net sales: 17486000000.0 (ProductOrServiceAxis, Services)
Net sales: 13156000000.0 (ProductOrServiceAxis, Services)
Net sales: 50148000000.0 (ProductOrServiceAxis, Services)
Net sales: 39219000000.0 (ProductOrServiceAxis, Services)
Net sales: 81434000000.0
Net sales: 59685000000.0
Net sales: 282457000000.0
Net sales: 209817000000.0
Net sales: 35870000000.0 (StatementBusinessSegmentsAxis, Americas)
Net sales: 27018000000.0 (StatementBusinessSegmentsAxis, Americas)
Net sales: 116486000000.0 (StatementBusinessSegmentsAxis, Americas)
Net sales: 93858000000.0 (StatementBusinessSegmentsAxis, Americas)
Net sales: 18943000000.0 (StatementBusinessSegmentsAxis, Europe)
Net sales: 14173000000.0 (StatementBusinessSegmentsAxis, Europe)
Net sales: 68513000000.0 (StatementBusinessSegmentsAxis, Europe)
Net sales: 51740000000.0 (StatementBusinessSegmentsAxis, Europe)
Net sales: 14762000000.0 (StatementBusinessSegmentsAxis, Greater China)
Net sales: 9329000000.0 (StatementBusinessSegmentsAxis, Greater China)
Net sales: 53803000000.0 (StatementBusinessSegmentsAxis, Greater China)
Net sales: 32362000000.0 (StatementBusinessSegmentsAxis, Greater China)
Net sales: 6464000000.0 (StatementBusinessSegmentsAxis, Japan)
Net sales: 4966000000.0 (StatementBusinessSegmentsAxis, Japan)
Net sales: 22491000000.0 (StatementBusinessSegmentsAxis, Japan)
Net sales: 16395000000.0 (StatementBusinessSegmentsAxis, Japan)
Net sales: 5395000000.0 (StatementBusinessSegmentsAxis, Rest of Asia Pacific)
Net sales: 4199000000.0 (StatementBusinessSegmentsAxis, Rest of Asia Pacific)
Net sales: 21164000000.0 (StatementBusinessSegmentsAxis, Rest of Asia Pacific)
Net sales: 15462000000.0 (StatementBusinessSegmentsAxis, Rest of Asia Pacific)

Process finished with exit code 0

Does this answer your question? :)

jonkatz6 commented 2 years ago

This actually does answer my question. I had found a very round about way of doing it which did not work with the old file structure. As a follow up though... is there any way to select for a table? For example the above shows a consolidated statement of operations. I've managed to find where that is specified in the taxonomy, but i am lost as to how i might find the net sales corresponding with that table.

edit: To rephrase, is there any way to determine which facts belong to the consolidated statement of operations, perhaps through the id or context ref?

mrx23dot commented 2 years ago

This might help, see extracted labels https://github.com/manusimidt/py-xbrl/issues/21

jonkatz6 commented 2 years ago

The #21 was useful in getting the relevant concept names, but is there any way to connect those to individual or specific inst.facts? I was not able to find any arc.to_locator objects which corresponded with anything in the facts object beside the concept.xml_id, but this is not specific enough to select certain facts.

manusimidt commented 2 years ago

Currently the libary does not combine the information from the presentation linkbase and the instance document. It just parses the different Locators and Arcs and stores them into the object tree. This means if you want to rearrange the facts in the instance document according to the presentation linkbase you have to do it on your own.

However I would like to introduce this in the future. I am planning to introduce a method compile_xbrl() which merges all information of the taxonomy with the facts of the instance document. It would then also be possible to access the facts in the structure given by the presentation linkbase. Unfortunately, I am currently very busy. That's why I won't be able to work on this feature before October.

But a full implementation of this is also quite challenging. Taxonomies can override each other and can prohibit the use of certain relationships. If you really want to implement it properly, you would have to concider overriding and inheritance of hierarchies within multiple imported taxonomies..

Also keep in mind that there are multiple types of taxonomies (calculation, presentation, definition, reference, label, formular ...). All are based on XLink, I would prefer a solution that works for all types of taxonomies (not only presentation linkbase).

If you want to get an introduction about XBRL Taxonomies you can read my blog entry "What is XBRL?" section "2.2 Taxonomy Linkbases". But this blog entry does not cover inheritance of taxonomies. For a deep dive into XBRL I can only recommend the book "XBRL for Interactive Data" (978-3-642-01437-6).

mrx23dot commented 2 years ago

That would be great if we could do for fact in sections['balance-sheet].facts: Just please consider keeping recursion at minimum.

gety9 commented 2 years ago

@manusimidt Manuel, i know you are busy (i think you mentioned that you continue the studies) but did you have near future plans on compile_xbrl() (contruction of income statement, balance sheet, cash flow statement) ?

manusimidt commented 2 years ago

To be honest I haven't started with this feature until now, because I thought that other features had higher priority. Since the compile_xbrl() is not a basic function, but rather a "nice to have", I have preferred other functions such as the ixbrl format transformations. Realistically, I'm not going to be able to implement it in the next few weeks either, unfortunately because it is a quite big feature.. So at the latest in August when I have semester break again. I will look at this again in the next week and see if I can find a quick way to implement it. It should be possible to develop this functionality outside of py-xbrl and just use XbrlInstance object you get back from py-xbrl.

gety9 commented 2 years ago

@manusimidt got it, that makes perfect sense.