manusimidt / py-xbrl

Python-based parser for parsing XBRL and iXBRL files
https://py-xbrl.readthedocs.io/en/latest/
GNU General Public License v3.0
116 stars 46 forks source link

parse_ixbrl does not close the file it opens #121

Open cdm-analytics opened 1 year ago

cdm-analytics commented 1 year ago

The function, "parse_ixbrl" in xbrl/instance.py opens a file with the id "instance_file" and doesn't close the file.

I am looping through large number of XBRL documents and running into an issue where I am hitting a system limit on open files.

As a work-around, I increased the system limit on open files. That helped but now I am running into memory issues.

I wonder if using "with" or explicitly closing the file would eliminate the issue I'm having.

Great module! Thanks!

cdm-analytics commented 1 year ago

OK, I think I fixed both problems I was having with the following code change to the "parse_ixbrl" function in xbrl/instance.py:

    instance_file = open(instance_path, "r", encoding=encoding)
    contents = instance_file.read()
    instance_file.close()
    pattern = r'<[ ]*script.*?\/[ ]*script[ ]*>'
    contents = re.sub(pattern, '', contents, flags=(re.IGNORECASE | re.MULTILINE | re.DOTALL))
    with StringIO(contents) as contents_object:
        root: ET.ElementTree = parse_file(contents_object)

The "close" line fixed the issue I had with opening too many files and the "with" code fixed the issue I had with maxing out my system's memory.

manusimidt commented 1 year ago

Ah, so you added the with block, right? Will investigate this for next major release. This is probably also relevant for the parse_xbrl function and not just parse_ixbrl. Thanks for the suggestion.

cdm-analytics commented 1 year ago

I also added the instance_file.close() line which could alternatively be done using a with block.

Thanks for the follow up! It's a great module that has been really helpful to me.

s-kust commented 11 months ago

@cdm-analytics how about also using with instead of instance_file.close() - ?