quantopian / pgcontents

A Postgres-backed ContentsManager implementation for Jupyter
Apache License 2.0
150 stars 84 forks source link

Create DB tables in advance? #75

Open nsshah1288 opened 4 years ago

nsshah1288 commented 4 years ago

Hi,

I was trying to use pgcontents to store Jupyter notebook code in PostgreSQL.

I set up a blank PostgresSQL database like it said in instructions and specified it in my jupyter_notebook_config.py:

from pgcontents import PostgresContentsManager c = get_config() # noqa c.NotebookApp.contents_manager_class = PostgresContentsManager c.PostgresContentsManager.db_url = 'postgresql://postgres:postgres@database-2.cprl6nrsccmr.us-east-1.rds.amazonaws.com/pgcontents'

Then I brought up the notebook in EKS and tried to see if it would save my Jupyter notebook code.

I got an error like this:

$ kubectl logs jupyter-deployment-f4846c8db-b7t95 -n jupyter Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1283, in _execute_context self.dialect.do_execute( File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/default.py", line 590, in do_execute cursor.execute(statement, parameters) psycopg2.errors.UndefinedTable: relation "pgcontents.users" does not exist LINE 1: INSERT INTO pgcontents.users (id) VALUES ('root') ^

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/usr/local/bin/jupyter-notebook", line 8, in sys.exit(main()) File "/usr/local/lib/python3.8/dist-packages/jupyter_core/application.py", line 270, in launch_instance return super(JupyterApp, cls).launch_instance(argv=argv, kwargs) File "/usr/local/lib/python3.8/dist-packages/traitlets/config/application.py", line 663, in launch_instance app.initialize(argv) File "", line 2, in initialize File "/usr/local/lib/python3.8/dist-packages/traitlets/config/application.py", line 87, in catch_config_error return method(app, *args, *kwargs) File "/usr/local/lib/python3.8/dist-packages/notebook/notebookapp.py", line 1766, in initialize self.init_configurables() File "/usr/local/lib/python3.8/dist-packages/notebook/notebookapp.py", line 1383, in init_configurables self.contents_manager = self.contents_manager_class( File "/usr/local/lib/python3.8/dist-packages/pgcontents/pgmanager.py", line 99, in init super(PostgresContentsManager, self).init(args, kwargs) File "/usr/local/lib/python3.8/dist-packages/pgcontents/managerbase.py", line 66, in init self.ensure_user() File "/usr/local/lib/python3.8/dist-packages/pgcontents/managerbase.py", line 70, in ensure_user ensure_db_user(db, self.user_id) File "/usr/local/lib/python3.8/dist-packages/pgcontents/query.py", line 89, in ensure_db_user db.execute( File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1020, in execute return meth(self, multiparams, params) File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params) File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1133, in _execute_clauseelement ret = self._execute_context( File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1323, in _execute_context self._handle_dbapi_exception( File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1517, in _handle_dbapiexception util.raise( File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/util/compat.py", line 178, in raise_ raise exception File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1283, in _execute_context self.dialect.do_execute( File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/default.py", line 590, in do_execute cursor.execute(statement, parameters) sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedTable) relation "pgcontents.users" does not exist LINE 1: INSERT INTO pgcontents.users (id) VALUES ('root')

Do I need to create the users, directories, files tables beforehand? I am confused because the directions say:

Prerequisites: Write access to an empty PostgreSQL database.

Also I was trying to look at the code and see where you actually create the tables, and I couldn't find where you actually do a 'CREATE TABLE' command. @ssanderson Could you please give guidance on this?

Thanks!

ssanderson commented 4 years ago

Hi @nsshah1288,

The tables used by pgcontents are created by the pgcontents init command noted in the README's installation steps. That command uses alembic to run a migration that should set up the database as necessary.

nsshah1288 commented 4 years ago

Thanks @ssanderson for the quick reply!

Right now I have a Dockerfile like this:

FROM ubuntu:latest
WORKDIR /code
RUN apt-get update && apt-get -y upgrade
RUN apt-get install -y build-essential python-dev libpq-dev
RUN apt-get install -y python3.6 python-distribute python3-pip
RUN pip3 -q install pip --upgrade
ADD jupyter_notebook_2_config.py /code
RUN pip3 install jupyter
RUN pip install pgcontents
RUN pgcontents init --db-url postgresql://postgres:postgres@academic-datalake.cluster-cprl6nrsccmr.us-east-1.rds.amazonaws.com/pgcontents --no-prompt
RUN mkdir /notebooks
CMD jupyter notebook --allow-root --no-browser --ip 0.0.0.0 --config=jupyter_notebook_2_config.py --port 8080 /notebooks

When I try to run docker build -t postgres .

I can't connect to the PostgreSQL DB that I have setup in AWS.

Step 10/12 : RUN pgcontents init --db-url postgresql://postgres:postgres@academic-datalake.cluster-cprl6nrsccmr.us-east-1.rds.amazonaws.com/pgcontents --no-prompt ---> Running in abffc35920e4 Initializing pgcontents... Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 2345, in _wrap_pool_connect return fn() File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/base.py", line 304, in unique_connection return _ConnectionFairy._checkout(self) File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/base.py", line 778, in _checkout fairy = _ConnectionRecord.checkout(pool) File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/base.py", line 495, in checkout rec = pool._do_get() File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/impl.py", line 239, in _do_get return self._create_connection() File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/base.py", line 309, in _create_connection return _ConnectionRecord(self) File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/base.py", line 440, in init self.connect(first_connect_check=True) File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/base.py", line 661, in connect pool.logger.debug("Error on connect(): %s", e) File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/util/langhelpers.py", line 68, in exit compat.raise( File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/util/compat.py", line 178, in raise raise exception File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/pool/base.py", line 656, in connect connection = pool._invoke_creator(self) File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/strategies.py", line 114, in connect return dialect.connect(*cargs, *cparams) File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/default.py", line 490, in connect return self.dbapi.connect(cargs, **cparams) File "/usr/local/lib/python3.8/dist-packages/psycopg2/init__.py", line 127, in connect conn = _connect(dsn, connection_factory=connection_factory, **kwasync) psycopg2.OperationalError: could not connect to server: Connection refused Is the server running on host "academic-datalake.cluster-cprl6nrsccmr.us-east-1.rds.amazonaws.com" (10.179.158.188) and accepting TCP/IP connections on port 5432? could not connect to server: Connection refused Is the server running on host "academic-datalake.cluster-cprl6nrsccmr.us-east-1.rds.amazonaws.com" (10.179.159.214) and accepting TCP/IP connections on port 5432?

This error makes sense, but I have not yet figured out how to supply temporary credentials in my jupyter_notebook_2_config.py file yet.

When I used s3contents I was able to specify access key, secret key, and session token, and then it connected to s3. Is there a way to do this using pgcontents?

Something like below?

c.PostgresContentsManager.access_key_id="blah"

c.PostgresContentsManager.secret_access_key="blah"

c.PostgresContentsManager.session_token="blah"

I am guessing I should instead put pgcontents init inside the jupyter_notebook_2_config.py, but I don't see how that will work if I can't get the temporary credentials to work. Any ideas?

nsshah1288 commented 4 years ago

@ssanderson do you have any suggestions?

ssanderson commented 4 years ago

hey @nsshah1288 . You only need to run pgcontents init once. It will create all the tables that will be used. It's usually done manually from the command line.

For specifying your database credentials, you can put those in the jupyter notebook config. There's an example of what that looks like here: https://github.com/quantopian/pgcontents/blob/master/examples/example_jupyter_notebook_config.py.

nsshah1288 commented 4 years ago

Hi @ssanderson thanks for the reply.

I ranpgcontents initwith a file database URL of postgresql://{username}:{password}@bd-serverless.cluster-cv3fz240lfke.us-east-1.rds.amazonaws.com:5432/postgres

I provided the real username and password obviously.

I am able to log into the query editor in AWS for this Aurora PostgreSQL serverless DB with the credentials. However, when I run pgcontents init with the same credentials, I get this error:

sh-4.2$ pgcontents init File Database URL: postgresql://master:postgres@bd-serverless.cluster-cv3fz240lfke.us-east-1.rds.amazonaws.com:5432/postgres Repeat for confirmation: postgresql://master:postgres@bd-serverless.cluster-cv3fz240lfke.us-east-1.rds.amazonaws.com:5432/postgres Initializing pgcontents...

About to run schema migrations against supplied database URL. If you have stored data from a previous pgcontents installation, it may not be correctly preserved.

It is HIGHLY recommended that you back up stored data before proceeding.

Proceed? [y/N]: y Traceback (most recent call last): File "/usr/local/bin/alembic", line 11, in load_entry_point('alembic==1.4.2', 'console_scripts', 'alembic')() File "/usr/local/lib/python2.7/site-packages/alembic/config.py", line 577, in main CommandLine(prog=prog).main(argv=argv) File "/usr/local/lib/python2.7/site-packages/alembic/config.py", line 571, in main self.run_cmd(cfg, options) File "/usr/local/lib/python2.7/site-packages/alembic/config.py", line 551, in run_cmd dict((k, getattr(options, k, None)) for k in kwarg) File "/usr/local/lib/python2.7/site-packages/alembic/command.py", line 298, in upgrade script.run_env() File "/usr/local/lib/python2.7/site-packages/alembic/script/base.py", line 489, in run_env util.load_python_file(self.dir, "env.py") File "/usr/local/lib/python2.7/site-packages/alembic/util/pyfiles.py", line 98, in load_python_file module = load_module_py(module_id, path) File "/usr/local/lib/python2.7/site-packages/alembic/util/compat.py", line 250, in load_module_py mod = imp.load_source(module_id, path, fp) File "/usr/local/lib/python2.7/site-packages/pgcontents/alembic/env.py", line 83, in run_migrations_online() File "/usr/local/lib/python2.7/site-packages/pgcontents/alembic/env.py", line 65, in run_migrations_online connection = engine.connect() File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 2272, in connect return self._connection_cls(self, kwargs) File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 104, in init else engine.raw_connection() File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 2379, in raw_connection self.pool.unique_connection, _connection File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 2349, in _wrap_pool_connect e, dialect, self File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1591, in _handle_dbapi_exception_noconnection sqlalchemy_exception, with_traceback=excinfo[2], from=e File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 2345, in _wrap_pool_connect return fn() File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/pool/base.py", line 304, in unique_connection return _ConnectionFairy._checkout(self) File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/pool/base.py", line 778, in _checkout fairy = _ConnectionRecord.checkout(pool) File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/pool/base.py", line 495, in checkout rec = pool._do_get() File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/pool/impl.py", line 239, in _do_get return self._create_connection() File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/pool/base.py", line 309, in _create_connection return _ConnectionRecord(self) File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/pool/base.py", line 440, in init self.connect(first_connect_check=True) File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/pool/base.py", line 661, in connect pool.logger.debug("Error on connect(): %s", e) File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 69, in exit exc_value, with_traceback=exc_tb, File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/pool/base.py", line 656, in connect connection = pool._invoke_creator(self) File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/engine/strategies.py", line 114, in connect return dialect.connect(*cargs, *cparams) File "/usr/local/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 490, in connect return self.dbapi.connect(cargs, **cparams) File "/usr/local/lib64/python2.7/site-packages/psycopg2/init__.py", line 127, in connect conn = _connect(dsn, connection_factory=connection_factory, **kwasync) sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) FATAL: password authentication failed for user "master" FATAL: password authentication failed for user "master"

(Background on this error at: http://sqlalche.me/e/e3q8) Traceback (most recent call last): File "/usr/local/bin/pgcontents", line 201, in main() File "/usr/local/lib/python2.7/site-packages/click/core.py", line 829, in call return self.main(args, kwargs) File "/usr/local/lib/python2.7/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/usr/local/lib/python2.7/site-packages/click/core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/usr/local/lib/python2.7/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, ctx.params) File "/usr/local/lib/python2.7/site-packages/click/core.py", line 610, in invoke return callback(args, **kwargs) File "/usr/local/bin/pgcontents", line 98, in init upgrade(db_url, revision) File "/usr/local/lib/python2.7/site-packages/pgcontents/utils/migrate.py", line 40, in upgrade ['alembic', '-c', alembic_ini, 'upgrade', revision] File "/usr/lib64/python2.7/subprocess.py", line 190, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['alembic', '-c', '/tmp/tmpf740BC/temp_alembic.ini', 'upgrade', u'head']' returned non-zero exit status 1

Do you know if pgcontents works with Aurora PostgreSQL Serverless? Thanks

nsshah1288 commented 4 years ago

Hi @ssanderson I tried using a normal PostgreSQL DB and pgcontents init worked. Do you know if pgcontents can work with PostgreSQL serverless?

Thanks!

jorisgillis commented 3 years ago

I'm having a similar issue as pl31 created (https://github.com/pl31/heroku-jupyter/issues/38).