manusimidt / py-xbrl

Python-based parser for parsing XBRL and iXBRL files
https://py-xbrl.readthedocs.io/en/latest/
GNU General Public License v3.0
111 stars 45 forks source link

Be nicer to submissions that do not follow the XBRL standard 100% #84

Open manusimidt opened 2 years ago

manusimidt commented 2 years ago

Implement some functionality that allows also for parsing XBRL reports that are violating the XBRL standart. Maybe just issue a warning and continue with parsing instead of crashing completely.

(from discussion:) Hey,

the concepts are defined in the different taxonomy schemas imported by the instance document.

For example: The first submission you provided failed at the concept: "in-ca:WhetherApprovalTakenFromBoardForMaterialContractsorArrangementsorTransactionsWithRelatedParty" which is prefixed by xmlns "in-ca". This xml namespace refers to the taxonomy with namespace "http://www.icai.org/xbrl/taxonomy/2016-03-31/in-ca". This is linked to the schema file located at https://www.mca.gov.in/XBRL/2016/07/26/Taxonomy/CnI/IN-CA/in-ca-2016-03-31.xsd. There you can check that the above mentioned concept is really not defined.

=> Thus the creator of this filing incorrectly used this non-existing concept which is why py-xbrl crashes.

The problematic line is the following: https://github.com/manusimidt/py-xbrl/blob/7be61f7dfe19491ef29ca917be0876c4da98284e/xbrl/instance.py#L336

Here I just expect the tax.name_id_map to have the given concept (which it also should according to the XBRL standard).

There where several discussions bevore about "How to treat incorrect XBRL". Because many users of py-xbrl just wan't to get data out of the reports and do not care if the report could be parsed 100%.

I plan to implement a functionality which would allow you to parse submissions that are incorrect (and maybe just issue a warning). But I am not able to work on py-xbrl until Mid July (due to university stuff).

So in the mean time i would suggest to just but a "try-catch" block around the line where it's failing. Like the following (untested):

# get the concept object from the taxonomy
tax = taxonomy.get_taxonomy(taxonomy_ns)
if tax is None: tax = _load_common_taxonomy(cache, taxonomy_ns, taxonomy)

try:
    concept: Concept = tax.concepts[tax.name_id_map[concept_name]]
    context: AbstractContext = context_dir[fact_elem.attrib['contextRef'].strip()]
except ValueError:
    print(f"All facts with concept {concept_name} will be ignored, due to invalid concept definition")
    continue

Originally posted by @manusimidt in https://github.com/manusimidt/py-xbrl/discussions/83#discussioncomment-3020257

manusimidt commented 2 years ago

This would affect mainly the instance module, but there both XBRL and iXBRL parsing is affected since these are separate functions.

PotatoProgrammer20 commented 2 years ago

Hi,

Thanks for the idea, I did some code changes, added a try catch block as you suggested and used beautiful soup to fetch the wrongly filed data from the XML file directly.

Here is my code:

    try:
        concept: Concept = tax.concepts[tax.name_id_map[concept_name]]
        context: AbstractContext = context_dir[fact_elem.attrib['contextRef'].strip()]
    except KeyError:
        print(f"\nAll facts with concept \t" + concept_name + "\t will be ignored, due to invalid concept definition\n")
        #print (f"this is the path \n", instance_path)

        from bs4 import BeautifulSoup

        file = open(instance_path,"r", encoding="utf-8")
        contents = file.read()
        soup = BeautifulSoup(contents, 'xml')
        tag_list = soup.find_all()
        for tag in tag_list:
            if tag.name == concept_name:
                print("This is the wrongly filed concept :\n" + concept_name + "\nThis is it's data:\n" + tag.text)
        continue

Now, i am getting the values on terminal, sure. But the final result is in dataframe, How do i append this result to the final dataframe? can you please help?

image

(this is my terminal result hope you are able to see this)

Thanks and regards.

PotatoProgrammer20 commented 2 years ago

Is there a way where i can integrate this result for wrongly filed concept names and add them to the "facts"

There seem to be many changes in parameters of the functions so I would rather wait for you to give an update regarding this.

Thanks

manusimidt commented 2 years ago

This change is now live in version 2.2.0

manusimidt commented 2 years ago

This could also apply to context id's (see #86)

manusimidt commented 11 months ago

This could also apply to missing or not locatable taxonomies. #112 #76