jsfenfen / 990-xml-database

Django app to consume and store 990 data and metadata
BSD 2-Clause "Simplified" License
23 stars 16 forks source link

No such file or directory: '<my/folder>/metadata/variables.csv' #14

Closed wmpay closed 6 years ago

wmpay commented 6 years ago

Trying to run python manage.py load_metadata from the "Adding the metadata" part of the setup. Is there a workaround for this? Where do I get the variables file? @jsfenfen

jsfenfen commented 6 years ago

Hi @wmpay could you say what OS you're working on, what version of python you're using, and what the entire verbatim script output is? Are the files present in your file system? How did you install stuff? What do you get if you enter something like $ irsx --format=csv 201533089349301428 from the command line? Does it work or freak out about missing .csv files?

This db is really a wrapper/datastructure around the irsx program... IF irsx isn't configured the database loader won't work either.

The variables.csv file is from the metadata repo here: https://github.com/jsfenfen/990-xml-metadata/ The metadata should have been installed as a dependency of irsx... it's packaged as a git submodule here: https://github.com/jsfenfen/990-xml-reader/tree/master/irs_reader. Not quite sure what's going on, will take a look at this ahead of the next release.

wmpay commented 6 years ago

I'm working on macOS HighSierra v 10.13.6. I installed most of the software with homebrew (aws cli, postgres, python3). I'm using python version 3.7. I installed the python packages using pipenv, which is just a tool that combines pip and virtualenv. When I do pip freeze, irsx version 0.2.2 is installed. How can I configure it if it is installed as a python package? The full error message is below:

Running metadata load on variables.
Deleting variables.
Traceback (most recent call last):
  File "manage.py", line 15, in <module>
    execute_from_command_line(sys.argv)
  File "/Users/mpaymar/.local/share/virtualenvs/990-xml-database-YbfOuN9i/lib/python3.7/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
    utility.execute()
  File "/Users/mpaymar/.local/share/virtualenvs/990-xml-database-YbfOuN9i/lib/python3.7/site-packages/django/core/management/__init__.py", line 375, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/Users/mpaymar/.local/share/virtualenvs/990-xml-database-YbfOuN9i/lib/python3.7/site-packages/django/core/management/base.py", line 316, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/Users/mpaymar/.local/share/virtualenvs/990-xml-database-YbfOuN9i/lib/python3.7/site-packages/django/core/management/base.py", line 353, in execute
    output = self.handle(*args, **options)
  File "/Users/mpaymar/990-xml-database/irsdb/metadata/management/commands/load_metadata.py", line 96, in handle
    self.reload_variables()
  File "/Users/mpaymar/990-xml-database/irsdb/metadata/management/commands/load_metadata.py", line 21, in reload_variables
    infile = open(infile, 'r')
FileNotFoundError: [Errno 2] No such file or directory: '/Users/mpaymar/foo/metadata/variables.csv'

Thanks for your help.

jsfenfen commented 6 years ago

Thanks for posting @wmpay.

It's looking in the '/Users/mpaymar/foo/' directory because that's where irsx says the metadata is--see this line: https://github.com/jsfenfen/990-xml-database/blob/master/irsdb/irsdb/settings.py#L132

Does the command line program irsx work, and is it installed? You should be able to tell by typing irsx --format=csv 201533089349301428 -- does that produce an output of a 990 or does it error out? If so, can you post the complete error message?

If you run something like pipenv run pip freeze does it say that irsx is installed, and if so, what version is it? The files it is looking for are part of the irsx installation process.

wmpay commented 6 years ago

Hi @jsfenfen. The irsx program is working when I run the command you send. The version is 0.2.2. I'm going to try to troubleshoot some more and I'll let you know if I solve the issue.

wmpay commented 6 years ago

I think I had added my own METADATA_DIRECTORY for troubleshooting purposes. When I remove it I get this error

(990-xml-database-YbfOuN9i) bash-3.2$ python manage.py load_metadata
/Users/mpaymar/.local/share/virtualenvs/990-xml-database-YbfOuN9i/lib/python3.7/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip instal
l psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
  """)Running metadata load on variables.
Deleting variables.
Created 0 rowsTraceback (most recent call last):
  File "manage.py", line 15, in <module>
    execute_from_command_line(sys.argv)
  File "/Users/mpaymar/.local/share/virtualenvs/990-xml-database-YbfOuN9i/lib/python3.7/site-packages/django/core/management/__i
nit__.py", line 381, in execute_from_command_line    utility.execute()  File "/Users/mpaymar/.local/share/virtualenvs/990-xml-database-YbfOuN9i/lib/python3.7/site-packages/django/core/management/__i
nit__.py", line 375, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)  File "/Users/mpaymar/.local/share/virtualenvs/990-xml-database-YbfOuN9i/lib/python3.7/site-packages/django/core/management/base.py", line 316, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/Users/mpaymar/.local/share/virtualenvs/990-xml-database-YbfOuN9i/lib/python3.7/site-packages/django/core/management/base.py", line 353, in execute    output = self.handle(*args, **options)
  File "/Users/mpaymar/990-xml-database/irsdb/metadata/management/commands/load_metadata.py", line 96, in handle
    self.reload_variables()  File "/Users/mpaymar/990-xml-database/irsdb/metadata/management/commands/load_metadata.py", line 27, in reload_variables
    if CANONICAL_VERSION in row['versions']:
KeyError: 'versions'
wmpay commented 6 years ago

hmm I ran it again from the beginning and it seems like it's working. No idea what I did differently this time. Thanks anyway.

wmpay commented 6 years ago

One last thing - how do I download the actually filings themselves? Do I need to do that manually with the irsx tool or is there a script to do it? Where should I download them to? Thanks again

jsfenfen commented 6 years ago

You can use amazon's s3 cli tool to move all the .xml files in bulk. IRSX will automatically download a filing it doesn't find in the file cache directory. As of irsx 0.2.3 you can set this with an environment variable https://github.com/jsfenfen/990-xml-reader#environment-variables.

More about dealing with the files in the readme: https://github.com/jsfenfen/990-xml-database/#file-size-concerns

wmpay commented 6 years ago

Hi - I was able to download the files with the aws cli but I'm still not sure how to actually load them into the DB. The load_monthly_filings and enter_yearly_submissions command doesn't seem to load anything so I think the tool doesn't know where the files are downloaded to? I tried setting the environment variables for irsx but that didn't seem to work. Sorry to keep spamming the issue board, would it be easier to direct message you somehow?

jsfenfen commented 6 years ago

@wmpay don't worry about spamming the issue board, it's useful for me to see what's confusing / tripping folks up. At some point I hope to rewrite the docs to include SQL queries that'll confirm that each step worked.

In general, each loading step is sequential, so if the annual index files aren't loaded it decides that there aren't any filings that need to be processed because there aren't any filings in the index files. Probably each script should have more verbose output. I don't have a clear timeline for when I'm gonna get to that though.

wmpay commented 6 years ago

As far as I know I've downloaded the index files for 2014 to 2018. I've downloaded about a gigabyte of the xml files from s3 into the directory I set as FILE_SYSTEM_BASE. I'm assuming at least some of those files would be present in the index files. However, load_filings still just says Done for any year I enter. I think I just have things configured incorrectly... is there a specific place I need to do the s3 sync to?

Georg-coder commented 4 years ago

I get a similar error message as your initial one, when loading the metadata: File`` "/Users/Georg/PycharmProjects/990-xml-database/irsdb/metadata/management/commands/load_metadata.py", line 31, in reload_variables infile = open(infilepath, 'r') FileNotFoundError: [Errno 2] No such file or directory: ``'/Users/Georg/PycharmProjects/990-xml-database/irsdb/generated_schemas/variables.csv'

How did you fix it? I also reran everything, but the error remains.

wmpay commented 4 years ago

I was never able to fix it

On Wed, Jan 29, 2020 at 11:49 PM Georg-coder notifications@github.com wrote:

I get a similar error message as your initial one, when loading the metadata: File "/Users/Georg/PycharmProjects/990-xml-database/irsdb/metadata/management/commands/load_metadata.py", line 31, in reload_variables infile = open(infilepath, 'r') FileNotFoundError: [Errno 2] No such file or directory: '/Users/Georg/PycharmProjects/990-xml-database/irsdb/generated_schemas/variables.csv'

How did you fix it? I also reran everything, but the error remains.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jsfenfen/990-xml-database/issues/14?email_source=notifications&email_token=AAOPICXBCSXIQAKZIUCRMRLRAJL5VA5CNFSM4FOXYKSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKJVRIY#issuecomment-580081827, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOPICXPZ67VWPSF3KMKTNLRAJL5VANCNFSM4FOXYKSA .

-- Best regards,

Max Paymar

Georg-coder commented 4 years ago

I think these folks solved it https://github.com/jsfenfen/990-xml-database/issues/28

but couldnt implement it yet... :/