yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
8.83k stars 1.05k forks source link

SQL Alchemy. -Airflow #5126

Open IS-Josh opened 4 years ago

IS-Josh commented 4 years ago

Jira Link: DB-2255 Just sending through some feedback on SQL Alchemy compatibility

I was looking to do a spike testing out Airflow using Yugabyte YSQL as a backend, which is deployed by Airflow via sqlalchemy.

I tried multiple drivers pg8000, pyscopg2, py-postgerql, none of them could successfully deploy the db without throwing an error and the client being disconnected.

ddorian commented 4 years ago

Hi @IS-Josh

Is it possible to get some error logs ? And the DDL that you ran so I can try to replicate ?

IS-Josh commented 4 years ago

Hi ddorian, The DDL being run was the backend for Airflow. If you follow the airflow install instructions https://airflow.apache.org/docs/1.10.3/installation.html#initiating-airflow-database and configure airflow to use postgres as the backend by editing the $AIRFLOW_HOME/airflow.cfg file you should be repeating my steps.

THe error from airflow initdb (using sql alchemy to deploy schema is below)

$ airflow initdb
DB: postgresql+psycopg2://yugabyte:***@x.x.x.x:5433/dev
[2020-07-19 15:02:21,129] {db.py:378} INFO - Creating tables
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade  -> e3a246e0dc1, current schema
Traceback (most recent call last):
  File "/Users/useraccount/.pyenv/versions/3.8.3/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1277, in _execute_context
    self.dialect.do_execute(
  File "/Users/useraccount/.pyenv/versions/3.8.3/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute
    cursor.execute(statement, parameters)
psycopg2.OperationalError: server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/useraccount/.pyenv/versions/3.8.3/bin/airflow", line 37, in <module>
    args.func(args)
  File "/Users/useraccount/.pyenv/versions/3.8.3/lib/python3.8/site-packages/airflow/bin/cli.py", line 1329, in initdb
    db.initdb(settings.RBAC)
  File "/Users/useraccount/.pyenv/versions/3.8.3/lib/python3.8/site-packages/airflow/utils/db.py", line 323, in initdb
    upgradedb()
  File "/Users/useraccount/.pyenv/versions/3.8.3/lib/python3.8/site-packages/airflow/utils/db.py", line 386, in upgradedb
    command.upgrade(config, 'heads')
  File "/Users/useraccount/.pyenv/versions/3.8.3/lib/python3.8/site-packages/alembic/command.py", line 298, in upgrade
    script.run_env()
  File "/Users/useraccount/.pyenv/versions/3.8.3/lib/python3.8/site-packages/alembic/script/base.py", line 489, in run_env
    util.load_python_file(self.dir, "env.py")
  File "/Users/useraccount/.pyenv/versions/3.8.3/lib/python3.8/site-packages/alembic/util/pyfiles.py", line 98, in load_python_file
    module = load_module_py(module_id, path)
  File "/Users/useraccount/.pyenv/versions/3.8.3/lib/python3.8/site-packages/alembic/util/compat.py", line 184, in load_module_py
    spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 783, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/Users/useraccount/.pyenv/versions/3.8.3/lib/python3.8/site-packages/airflow/migrations/env.py", line 97, in <module>
    run_migrations_online()
  File "/Users/useraccount/.pyenv/versions/3.8.3/lib/python3.8/site-packages/airflow/migrations/env.py", line 91, in run_migrations_online
    context.run_migrations()
  File "<string>", line 8, in run_migrations
  File "/Users/useraccount/.pyenv/versions/3.8.3/lib/python3.8/site-packages/alembic/runtime/environment.py", line 846, in run_migrations
    self.get_context().run_migrations(**kw)
  File "/Users/useraccount/.pyenv/versions/3.8.3/lib/python3.8/site-packages/alembic/runtime/migration.py", line 520, in run_migrations
    step.migration_fn(**kw)
  File "/Users/useraccount/.pyenv/versions/3.8.3/lib/python3.8/site-packages/airflow/migrations/versions/e3a246e0dc1_current_schema.py", line 94, in upgrade
    op.create_table(
  File "<string>", line 8, in create_table
  File "<string>", line 3, in create_table
  File "/Users/useraccount/.pyenv/versions/3.8.3/lib/python3.8/site-packages/alembic/operations/ops.py", line 1252, in create_table
    return operations.invoke(op)
  File "/Users/useraccount/.pyenv/versions/3.8.3/lib/python3.8/site-packages/alembic/operations/base.py", line 374, in invoke
    return fn(self, operation)
  File "/Users/useraccount/.pyenv/versions/3.8.3/lib/python3.8/site-packages/alembic/operations/toimpl.py", line 101, in create_table
    operations.impl.create_table(table)
  File "/Users/useraccount/.pyenv/versions/3.8.3/lib/python3.8/site-packages/alembic/ddl/impl.py", line 258, in create_table
    self._exec(schema.CreateTable(table))
  File "/Users/useraccount/.pyenv/versions/3.8.3/lib/python3.8/site-packages/alembic/ddl/impl.py", line 140, in _exec
    return conn.execute(construct, *multiparams, **params)
  File "/Users/useraccount/.pyenv/versions/3.8.3/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1014, in execute
    return meth(self, multiparams, params)
  File "/Users/useraccount/.pyenv/versions/3.8.3/lib/python3.8/site-packages/sqlalchemy/sql/ddl.py", line 72, in _execute_on_connection
    return connection._execute_ddl(self, multiparams, params)
  File "/Users/useraccount/.pyenv/versions/3.8.3/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1071, in _execute_ddl
    ret = self._execute_context(
  File "/Users/useraccount/.pyenv/versions/3.8.3/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1317, in _execute_context
    self._handle_dbapi_exception(
  File "/Users/useraccount/.pyenv/versions/3.8.3/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1511, in _handle_dbapi_exception
    util.raise_(
  File "/Users/useraccount/.pyenv/versions/3.8.3/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 178, in raise_
    raise exception
  File "/Users/useraccount/.pyenv/versions/3.8.3/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1277, in _execute_context
    self.dialect.do_execute(
  File "/Users/useraccount/.pyenv/versions/3.8.3/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
[SQL: 
CREATE TABLE job (
    id SERIAL NOT NULL, 
    dag_id VARCHAR(250), 
    state VARCHAR(20), 
    job_type VARCHAR(30), 
    start_date TIMESTAMP WITHOUT TIME ZONE, 
    end_date TIMESTAMP WITHOUT TIME ZONE, 
    latest_heartbeat TIMESTAMP WITHOUT TIME ZONE, 
    executor_class VARCHAR(500), 
    hostname VARCHAR(500), 
    unixname VARCHAR(1000), 
    PRIMARY KEY (id)
)

]
(Background on this error at: http://sqlalche.me/e/13/e3q8)
ddorian commented 4 years ago

@IS-Josh here is how to get logs https://docs.yugabyte.com/latest/troubleshoot/nodes/check-logs/. Get for .err, .WARNING, .ERROR, .INFO for yb-tserver, master, postgresql and attach as zip file.

ddorian commented 4 years ago

The error happens here https://github.com/apache/airflow/blob/master/airflow/migrations/versions/e3a246e0dc1_current_schema.py#L94. I think because sqlalchemy-alembic expects transactional DDL which we don't support yet https://github.com/yugabyte/yugabyte-db/issues/3109

rkarthik007 commented 4 years ago

cc @nmalladi who is starting to look at these.

IS-Josh commented 4 years ago

hi @ddorian

we used the statefull set provided by yugabyte for deployment on azure. By Default this config outputs errors to stderr. We're currently not using analytics service due to the high cost of it - we're going to roll our own monitoring solution. In the mean time it's quite cumbersome to filter through logs using the Kubernets Plugin for MIcrosoft Code.

If you still need logs let me know and i'll set some time aside to switch logging to log files on an external volume and re-try the airflow deployment script.