manusimidt / py-xbrl

Python-based parser for parsing XBRL and iXBRL files
https://py-xbrl.readthedocs.io/en/latest/
GNU General Public License v3.0
111 stars 40 forks source link

parse_ixbrl does not close the file it opens #121

Open cdm-analytics opened 10 months ago

cdm-analytics commented 10 months ago

The function, "parse_ixbrl" in xbrl/instance.py opens a file with the id "instance_file" and doesn't close the file.

I am looping through large number of XBRL documents and running into an issue where I am hitting a system limit on open files.

As a work-around, I increased the system limit on open files. That helped but now I am running into memory issues.

I wonder if using "with" or explicitly closing the file would eliminate the issue I'm having.

Great module! Thanks!

cdm-analytics commented 10 months ago

OK, I think I fixed both problems I was having with the following code change to the "parse_ixbrl" function in xbrl/instance.py:

    instance_file = open(instance_path, "r", encoding=encoding)
    contents = instance_file.read()
    instance_file.close()
    pattern = r'<[ ]*script.*?\/[ ]*script[ ]*>'
    contents = re.sub(pattern, '', contents, flags=(re.IGNORECASE | re.MULTILINE | re.DOTALL))
    with StringIO(contents) as contents_object:
        root: ET.ElementTree = parse_file(contents_object)

The "close" line fixed the issue I had with opening too many files and the "with" code fixed the issue I had with maxing out my system's memory.

manusimidt commented 10 months ago

Ah, so you added the with block, right? Will investigate this for next major release. This is probably also relevant for the parse_xbrl function and not just parse_ixbrl. Thanks for the suggestion.

cdm-analytics commented 10 months ago

I also added the instance_file.close() line which could alternatively be done using a with block.

Thanks for the follow up! It's a great module that has been really helpful to me.

s-kust commented 9 months ago

@cdm-analytics how about also using with instead of instance_file.close() - ?