manusimidt / py-xbrl

Python-based parser for parsing XBRL and iXBRL files
https://py-xbrl.readthedocs.io/en/latest/
GNU General Public License v3.0
100 stars 37 forks source link

Label Linkbase instance should have get_label() function #10

Open manusimidt opened 3 years ago

manusimidt commented 3 years ago

Would be handy if the label linkbase would return an array of possible labels given the ID of a concept.

Pablompg commented 3 years ago

I am working on building a dictionary that maps labels with concept's ids. This is my code so far:

def get_label_dict(inst):
  label_dict = {}
  for lab_linkbase in inst.taxonomy.lab_linkbases:
      for extended_link in lab_linkbase.extended_links:
          for root_locator in extended_link.root_locators:
              for children in root_locator.children:
                  for label in children.labels:
                      if label.label_type.split('/')[-1] != 'terseLabel':
                          continue
                      concept_id = root_locator.concept_id.split('_')[-1]
                      label_dict[concept_id] = label.text

  return label_dict

I want to be able to get the label for a given fact. I know this is not a great solution. Do you have any ideas on how to improve it?

manusimidt commented 3 years ago

Yes, the current very deep structure of object instances is quite annoying. In the past I tried to fix this problem by creating a compile method that takes all information from the linkbases and directly associates it with the different concepts.

While this is quite easy for the labels, i had issues recreating the hierachical structures of the presentation, definition and calculation linkbases which was the reason why i haven't published it.

But in this case it would be sufficient to restrict oneself to the label linkbase for the time being. I would suggest to create a label attribute directly in the concept object instance. This would allow the user to directly access it. What is not so nice about this solution, however, is that the label would then be stored twice. Once in the linkbase object and then again in the concept object instance.

https://github.com/manusimidt/xbrl_parser/blob/bdc0abd3cbb8885eceabf9288b673d17e97bad5d/xbrl/taxonomy.py#L69-L92 I will have a look at the code I wrote back then and get back to you by tomorrow evening :)

manusimidt commented 3 years ago

But yes, the code you provided is currently the best and only method to get the labels.

Pablompg commented 3 years ago

I think storing the label inside the concept would be great.

I don't foresee any issues in having the information duplicated. My concerns would be:

Using the nested for loop I shared in my first comment I didn't notice any decrease in performance, I don't think speed is an issue as the bottleneck would not be here.

And I don't think adding a new attribute in the concept is a memory issue either. My estimation is:

4 bytes per character x 150 character per label x 10000 concepts = 6 MB

That is a worst-case scenario estimation of the memory cost. I think this is acceptable, but that is just my opinion.

manusimidt commented 3 years ago

Yes, i also think that from a performance perspective this would not be a problem. If we could find a way to also "compile" the information from the relational linkbases and assign it directly to the various concepts, we might not even need the current linkbase structure.

manusimidt commented 3 years ago

This is working pretty well for now. But tomorrow I would like to test the change a little bit and add a method getLabel() to the class Concept. After that i will publish a new version of the package.

image

manusimidt commented 3 years ago

After cfd59c358d1415ef43f0f7cd1fe11fee57820487 this is now working quite good for remote submissions. But I still run into some errors when parsing locally saved submissions.

pvmagacho-nde commented 1 week ago

@manusimidt this is an old issue that is still open. Any chance this would get fixed?