Add script to generate manifest

pacificclimate / modelmeta

An ORM representation of the model metadata database

GNU General Public License v3.0

1 stars 0 forks source link

Add script to generate manifest #96

Closed jameshiebert closed 4 years ago

jameshiebert commented 4 years ago

modelmeta tracks NetCDF files that are used in a variety of our applications: the PCIC Data Portal (PDP), the PCIC Climate Explorer (PCEx), Plan2Adapt (P2A) and our mapping application, ncWMS.

Some of these applications are run on different systems and occasionally we need to sync data from one system to another. It would be useful to have a script that generates a manifest of all the files that are needed, and the list could be used in an rsync command or something.

Possible command line options:

-c --connection-string: Database Data Source Name (DSN)
-e --ensemble: One (or more) ensembles from which files should be listed (default to all)
-s --since: Date after which files should be listed (i.e. don't include files which were indexed before this date)

An example usage would be something like this:

generate_manifest -c "postgresql://db3.pcic.uvic.ca/ce_meta" -e "plan2adapt" -s "2020-01-01" | xargs rsync ...

eyvorchuk commented 4 years ago

I'm trying to get used to sqlalchemy, so I ran the following commands to view the list of ensemble names while using python on the command line:

>>> from sqlalchemy import create_engine
>>> from sqlalchemy.orm import sessionmaker
>>> engine = create_engine("postgresql://db3.pcic.uvic.ca/ce_meta")
>>> session = sessionmaker(bind=engine)()
>>> from modelmeta import DataFile, DataFileVariable, Ensemble, TimeSet
### Creating modelmeta ORM
>>> session.query(Ensemble.name).all()

and got the following error:

psycopg2.OperationalError: fe_sendauth: no password supplied

I was wondering if this is the correct way to get the list of ensemble names, and if so, what the password is.

corviday commented 4 years ago

Seems like a good approach!

The database usernames and passwords are available in Team Password Manager. If you don't have access to Team Password Manager, Matthew can get set you up. Usually it is supplied as part of the connection string, like postgresql://username:password@server/database

Don't put the password in any scripts that you commit to github - github is public and we don't want the whole world seeing our passwords. We typically write scripts to take the connection string as an argument so that 1) passwords aren't part of the script, and 2) if someone is using a test database to develop new features, they can run your script on that database by changing the connection string.

jameshiebert commented 4 years ago

Usually it is supplied as part of the connection string, like postgresql://username:password@server/database

Or preferably by using a .pgpass file.