jsfenfen / 990-xml-database

Django app to consume and store 990 data and metadata
BSD 2-Clause "Simplified" License
23 stars 16 forks source link

File may be damaged or incomplete #27

Closed rabsef-bicrym closed 4 years ago

rabsef-bicrym commented 4 years ago

One more question for you:

I am loading the various years I need successfully (I think) but in 2015, after some time (115300 filings + processed), I get the following error message. Should I do as is described (delete the file and download again)? Would that just be using the irsx command for that specific return?

` Traceback (most recent call last): File "/home/ian/.local/lib/python3.6/site-packages/irsx/filing.py", line 117, in _set_dict_from_xml self.raw_irs_dict['Return'] KeyError: 'Return'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "manage.py", line 15, in execute_from_command_line(sys.argv) File "/home/ian/.local/lib/python3.6/site-packages/django/core/management/init.py", line 401, in execute_from_command_line utility.execute() File "/home/ian/.local/lib/python3.6/site-packages/django/core/management/init.py", line 395, in execute self.fetch_command(subcommand).run_from_argv(self.argv) File "/home/ian/.local/lib/python3.6/site-packages/django/core/management/base.py", line 328, in run_from_argv self.execute(*args, *cmd_options) File "/home/ian/.local/lib/python3.6/site-packages/django/core/management/base.py", line 369, in execute output = self.handle(args, **options) File "/home/ian/990-xml-database/irsdb/return/management/commands/load_filings.py", line 116, in handle self.run_filing(filing) File "/home/ian/990-xml-database/irsdb/return/management/commands/load_filings.py", line 58, in run_filing parsed_filing = self.xml_runner.run_filing(object_id) File "/home/ian/.local/lib/python3.6/site-packages/irsx/xmlrunner.py", line 111, in run_filing this_filing.process(verbose=verbose) File "/home/ian/.local/lib/python3.6/site-packages/irsx/filing.py", line 233, in process self._set_dict_from_xml() File "/home/ian/.local/lib/python3.6/site-packages/irsx/filing.py", line 122, in _set_dict_from_xml

jsfenfen commented 4 years ago

Yeah, I would erase the offending file, then try running irsx 201541349349102174 <-- that will require it to download again. I'm able to parse that file from here, so it should probably work for you?

One thing you can do to check that there are no "half loaded filings" is run this query: select ein, object_id from filing_filing where parse_started and not parse_complete. If you do fine half-loaded filings, you probably wanna remove them so they don't get reentered (this usually doesn't happen, it's just a sanity check). If you do find any half loaded filings, you wanna remove them with the "remove_half_loaded" management command here: https://github.com/jsfenfen/990-xml-database/blob/master/irsdb/return/management/commands/remove_half_loaded.py

rabsef-bicrym commented 4 years ago

Thank you - I sincerely appreciate your help on all of this - your program is beyond my technical comprehension but w/ a lil help from friends and from you, I think I'm getting it there.

That seems to have worked