Closed wmpay closed 6 years ago
Hi @wmpay could you say what OS you're working on, what version of python you're using, and what the entire verbatim script output is? Are the files present in your file system? How did you install stuff? What do you get if you enter something like $ irsx --format=csv 201533089349301428
from the command line? Does it work or freak out about missing .csv files?
This db is really a wrapper/datastructure around the irsx program... IF irsx isn't configured the database loader won't work either.
The variables.csv file is from the metadata repo here: https://github.com/jsfenfen/990-xml-metadata/ The metadata should have been installed as a dependency of irsx... it's packaged as a git submodule here: https://github.com/jsfenfen/990-xml-reader/tree/master/irs_reader. Not quite sure what's going on, will take a look at this ahead of the next release.
I'm working on macOS HighSierra v 10.13.6. I installed most of the software with homebrew (aws cli, postgres, python3). I'm using python version 3.7. I installed the python packages using pipenv, which is just a tool that combines pip and virtualenv. When I do pip freeze, irsx version 0.2.2 is installed. How can I configure it if it is installed as a python package? The full error message is below:
Running metadata load on variables.
Deleting variables.
Traceback (most recent call last):
File "manage.py", line 15, in <module>
execute_from_command_line(sys.argv)
File "/Users/mpaymar/.local/share/virtualenvs/990-xml-database-YbfOuN9i/lib/python3.7/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
utility.execute()
File "/Users/mpaymar/.local/share/virtualenvs/990-xml-database-YbfOuN9i/lib/python3.7/site-packages/django/core/management/__init__.py", line 375, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/Users/mpaymar/.local/share/virtualenvs/990-xml-database-YbfOuN9i/lib/python3.7/site-packages/django/core/management/base.py", line 316, in run_from_argv
self.execute(*args, **cmd_options)
File "/Users/mpaymar/.local/share/virtualenvs/990-xml-database-YbfOuN9i/lib/python3.7/site-packages/django/core/management/base.py", line 353, in execute
output = self.handle(*args, **options)
File "/Users/mpaymar/990-xml-database/irsdb/metadata/management/commands/load_metadata.py", line 96, in handle
self.reload_variables()
File "/Users/mpaymar/990-xml-database/irsdb/metadata/management/commands/load_metadata.py", line 21, in reload_variables
infile = open(infile, 'r')
FileNotFoundError: [Errno 2] No such file or directory: '/Users/mpaymar/foo/metadata/variables.csv'
Thanks for your help.
Thanks for posting @wmpay.
It's looking in the '/Users/mpaymar/foo/' directory because that's where irsx says the metadata is--see this line: https://github.com/jsfenfen/990-xml-database/blob/master/irsdb/irsdb/settings.py#L132
Does the command line program irsx work, and is it installed? You should be able to tell by typing irsx --format=csv 201533089349301428
-- does that produce an output of a 990 or does it error out? If so, can you post the complete error message?
If you run something like pipenv run pip freeze
does it say that irsx is installed, and if so, what version is it? The files it is looking for are part of the irsx installation process.
Hi @jsfenfen. The irsx program is working when I run the command you send. The version is 0.2.2. I'm going to try to troubleshoot some more and I'll let you know if I solve the issue.
I think I had added my own METADATA_DIRECTORY for troubleshooting purposes. When I remove it I get this error
(990-xml-database-YbfOuN9i) bash-3.2$ python manage.py load_metadata
/Users/mpaymar/.local/share/virtualenvs/990-xml-database-YbfOuN9i/lib/python3.7/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip instal
l psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
""")Running metadata load on variables.
Deleting variables.
Created 0 rowsTraceback (most recent call last):
File "manage.py", line 15, in <module>
execute_from_command_line(sys.argv)
File "/Users/mpaymar/.local/share/virtualenvs/990-xml-database-YbfOuN9i/lib/python3.7/site-packages/django/core/management/__i
nit__.py", line 381, in execute_from_command_line utility.execute() File "/Users/mpaymar/.local/share/virtualenvs/990-xml-database-YbfOuN9i/lib/python3.7/site-packages/django/core/management/__i
nit__.py", line 375, in execute
self.fetch_command(subcommand).run_from_argv(self.argv) File "/Users/mpaymar/.local/share/virtualenvs/990-xml-database-YbfOuN9i/lib/python3.7/site-packages/django/core/management/base.py", line 316, in run_from_argv
self.execute(*args, **cmd_options)
File "/Users/mpaymar/.local/share/virtualenvs/990-xml-database-YbfOuN9i/lib/python3.7/site-packages/django/core/management/base.py", line 353, in execute output = self.handle(*args, **options)
File "/Users/mpaymar/990-xml-database/irsdb/metadata/management/commands/load_metadata.py", line 96, in handle
self.reload_variables() File "/Users/mpaymar/990-xml-database/irsdb/metadata/management/commands/load_metadata.py", line 27, in reload_variables
if CANONICAL_VERSION in row['versions']:
KeyError: 'versions'
hmm I ran it again from the beginning and it seems like it's working. No idea what I did differently this time. Thanks anyway.
One last thing - how do I download the actually filings themselves? Do I need to do that manually with the irsx tool or is there a script to do it? Where should I download them to? Thanks again
You can use amazon's s3 cli tool to move all the .xml files in bulk. IRSX will automatically download a filing it doesn't find in the file cache directory. As of irsx 0.2.3 you can set this with an environment variable https://github.com/jsfenfen/990-xml-reader#environment-variables.
More about dealing with the files in the readme: https://github.com/jsfenfen/990-xml-database/#file-size-concerns
Hi - I was able to download the files with the aws cli but I'm still not sure how to actually load them into the DB. The load_monthly_filings
and enter_yearly_submissions
command doesn't seem to load anything so I think the tool doesn't know where the files are downloaded to? I tried setting the environment variables for irsx but that didn't seem to work. Sorry to keep spamming the issue board, would it be easier to direct message you somehow?
@wmpay don't worry about spamming the issue board, it's useful for me to see what's confusing / tripping folks up. At some point I hope to rewrite the docs to include SQL queries that'll confirm that each step worked.
In general, each loading step is sequential, so if the annual index files aren't loaded it decides that there aren't any filings that need to be processed because there aren't any filings in the index files. Probably each script should have more verbose output. I don't have a clear timeline for when I'm gonna get to that though.
As far as I know I've downloaded the index files for 2014 to 2018. I've downloaded about a gigabyte of the xml files from s3 into the directory I set as FILE_SYSTEM_BASE
. I'm assuming at least some of those files would be present in the index files. However, load_filings
still just says Done for any year I enter. I think I just have things configured incorrectly... is there a specific place I need to do the s3 sync to?
I get a similar error message as your initial one, when loading the metadata:
File`` "/Users/Georg/PycharmProjects/990-xml-database/irsdb/metadata/management/commands/load_metadata.py", line 31, in reload_variables infile = open(infilepath, 'r') FileNotFoundError: [Errno 2] No such file or directory: ``'/Users/Georg/PycharmProjects/990-xml-database/irsdb/generated_schemas/variables.csv'
How did you fix it? I also reran everything, but the error remains.
I was never able to fix it
On Wed, Jan 29, 2020 at 11:49 PM Georg-coder notifications@github.com wrote:
I get a similar error message as your initial one, when loading the metadata: File
"/Users/Georg/PycharmProjects/990-xml-database/irsdb/metadata/management/commands/load_metadata.py", line 31, in reload_variables infile = open(infilepath, 'r') FileNotFoundError: [Errno 2] No such file or directory:
'/Users/Georg/PycharmProjects/990-xml-database/irsdb/generated_schemas/variables.csv'How did you fix it? I also reran everything, but the error remains.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jsfenfen/990-xml-database/issues/14?email_source=notifications&email_token=AAOPICXBCSXIQAKZIUCRMRLRAJL5VA5CNFSM4FOXYKSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKJVRIY#issuecomment-580081827, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOPICXPZ67VWPSF3KMKTNLRAJL5VANCNFSM4FOXYKSA .
-- Best regards,
Max Paymar
I think these folks solved it https://github.com/jsfenfen/990-xml-database/issues/28
but couldnt implement it yet... :/
Trying to run
python manage.py load_metadata
from the "Adding the metadata" part of the setup. Is there a workaround for this? Where do I get the variables file? @jsfenfen