googleapis / python-spanner-django

Cloud Spanner database backend for Django
BSD 3-Clause "New" or "Revised" License
90 stars 28 forks source link

Extremely slow django migrations to cloud spanner and fails half way. #756

Open jeetpatel9 opened 2 years ago

jeetpatel9 commented 2 years ago

Hello everyone,

I am trying to set up my Django project with Cloud Spanner. I did everything as told in the documentation like adding the app into installed apps list -INSTALLED_APPS = ['django_spanner',], database configuration in settings.py after creating spanner instance & database. Then created a service account with the appropriate permissions & at last set the environment variables with the service account key & project id as shown below.

export GOOGLE_APPLICATION_CREDENTIALS=/path/to/keyfile.json
export GOOGLE_CLOUD_PROJECT=gcloud_project

So far I am able to connect to the spanner instance when I spin up my project but when I run python manage.py migrate it takes like 10-20 mins to create 3-4 tables in the database. I waited long, around 30 mins to see if it could finish but after creating few more table it failed.

I don’t know if this makes any difference but the instance I created is in Mumbai region and I am 300 KM away running my application locally inside a docker container. Also the spanner instance is regional & has only 1 node. I think it is good enough to run the migrations which would create around 30 tables.

Here the complete traceback of what I am seeing -

No changes detected

Operations to perform:

  Apply all migrations: admin, advertisement, announcements, auth, contenttypes, dividends, fee_groups, orders, properties, sessions, system_users, trade, wallet, watchlist

Running migrations:

  Applying contenttypes.0001_initial... OK

  Applying contenttypes.0002_remove_content_type_name... OK

  Applying auth.0001_initial... OK

  Applying auth.0002_alter_permission_name_max_length... OK

  Applying auth.0003_alter_user_email_max_length... OK

  Applying auth.0004_alter_user_username_opts... OK

  Applying auth.0005_alter_user_last_login_null... OK

  Applying auth.0006_require_contenttypes_0002... OK

  Applying auth.0007_alter_validators_add_error_messages... OK

  Applying auth.0008_alter_user_username_max_length... OK

  Applying auth.0009_alter_user_last_name_max_length... OK

  Applying auth.0010_alter_group_name_max_length... OK

  Applying auth.0011_update_proxy_permissions... OK

  Applying auth.0012_alter_user_first_name_max_length... OK

  Applying system_users.0001_initial... OK

  Applying admin.0001_initial... OK

  Applying admin.0002_logentry_remove_auto_add... OK

  Applying admin.0003_logentry_add_action_flag_choices... OK

  Applying advertisement.0001_initial... OK

/usr/local/lib/python3.7/site-packages/django/db/backends/base/base.py:242: UserWarning: This method is non-operational in autocommit mode

  return self.connection.commit()

Traceback (most recent call last):

  File "/usr/local/lib/python3.7/site-packages/google/api_core/grpc_helpers.py", line 67, in error_remapped_callable

    return callable_(*args, **kwargs)

  File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 946, in __call__

    return _end_unary_response_blocking(state, call, False, None)

  File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking

    raise _InactiveRpcError(state)

grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:

status = StatusCode.DEADLINE_EXCEEDED

details = "Deadline Exceeded"

debug_error_string = "{"created":"@1644408066.856624881","description":"Error received from peer ipv4:142.250.199.170:443","file":"src/core/lib/surface/call.cc","file_line":1074,"grpc_message":"Deadline Exceeded","grpc_status":4}"

>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):

  File "manage.py", line 22, in <module>

    main()

  File "manage.py", line 18, in main

    execute_from_command_line(sys.argv)

  File "/usr/local/lib/python3.7/site-packages/django/core/management/__init__.py", line 419, in execute_from_command_line

    utility.execute()

  File "/usr/local/lib/python3.7/site-packages/django/core/management/__init__.py", line 413, in execute

    self.fetch_command(subcommand).run_from_argv(self.argv)

  File "/usr/local/lib/python3.7/site-packages/django/core/management/base.py", line 354, in run_from_argv

    self.execute(*args, **cmd_options)

  File "/usr/local/lib/python3.7/site-packages/django/core/management/base.py", line 398, in execute

    output = self.handle(*args, **options)

  File "/usr/local/lib/python3.7/site-packages/django/core/management/base.py", line 89, in wrapped

    res = handle_func(*args, **kwargs)

  File "/usr/local/lib/python3.7/site-packages/django/core/management/commands/migrate.py", line 246, in handle

    fake_initial=fake_initial,

  File "/usr/local/lib/python3.7/site-packages/django/db/migrations/executor.py", line 117, in migrate

    state = self._migrate_all_forwards(state, plan, full_plan, fake=fake, fake_initial=fake_initial)

  File "/usr/local/lib/python3.7/site-packages/django/db/migrations/executor.py", line 147, in _migrate_all_forwards

    state = self.apply_migration(state, migration, fake=fake, fake_initial=fake_initial)

  File "/usr/local/lib/python3.7/site-packages/django/db/migrations/executor.py", line 230, in apply_migration

    migration_recorded = True

  File "/usr/local/lib/python3.7/site-packages/django/db/backends/base/schema.py", line 118, in __exit__

    self.execute(sql)

  File "/usr/local/lib/python3.7/site-packages/django/db/backends/base/schema.py", line 145, in execute

    cursor.execute(sql, params)

  File "/usr/local/lib/python3.7/site-packages/django/db/backends/utils.py", line 66, in execute

    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)

  File "/usr/local/lib/python3.7/site-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers

    return executor(sql, params, many, context)

  File "/usr/local/lib/python3.7/site-packages/django/db/backends/utils.py", line 84, in _execute

    return self.cursor.execute(sql, params)

  File "/usr/local/lib/python3.7/site-packages/google/cloud/spanner_dbapi/cursor.py", line 70, in wrapper

    return function(cursor, *args, **kwargs)

  File "/usr/local/lib/python3.7/site-packages/google/cloud/spanner_dbapi/cursor.py", line 238, in execute

    self.connection.run_prior_DDL_statements()

  File "/usr/local/lib/python3.7/site-packages/google/cloud/spanner_dbapi/connection.py", line 55, in wrapper

    return function(connection, *args, **kwargs)

  File "/usr/local/lib/python3.7/site-packages/google/cloud/spanner_dbapi/connection.py", line 413, in run_prior_DDL_statements

    return self.database.update_ddl(ddl_statements).result()

  File "/usr/local/lib/python3.7/site-packages/google/api_core/future/polling.py", line 130, in result

    self._blocking_poll(timeout=timeout, **kwargs)

  File "/usr/local/lib/python3.7/site-packages/google/api_core/future/polling.py", line 108, in _blocking_poll

    retry_(self._done_or_raise)(**kwargs)

  File "/usr/local/lib/python3.7/site-packages/google/api_core/retry.py", line 291, in retry_wrapped_func

    on_error=on_error,

  File "/usr/local/lib/python3.7/site-packages/google/api_core/retry.py", line 189, in retry_target

    return target()

  File "/usr/local/lib/python3.7/site-packages/google/api_core/future/polling.py", line 86, in _done_or_raise

    if not self.done(**kwargs):

  File "/usr/local/lib/python3.7/site-packages/google/api_core/operation.py", line 170, in done

    self._refresh_and_update(retry)

  File "/usr/local/lib/python3.7/site-packages/google/api_core/operation.py", line 158, in _refresh_and_update

    self._operation = self._refresh(retry=retry)

  File "/usr/local/lib/python3.7/site-packages/google/api_core/operations_v1/operations_client.py", line 143, in get_operation

    request, retry=retry, timeout=timeout, metadata=metadata

  File "/usr/local/lib/python3.7/site-packages/google/api_core/gapic_v1/method.py", line 145, in __call__

    return wrapped_func(*args, **kwargs)

  File "/usr/local/lib/python3.7/site-packages/google/api_core/retry.py", line 291, in retry_wrapped_func

    on_error=on_error,

  File "/usr/local/lib/python3.7/site-packages/google/api_core/retry.py", line 189, in retry_target

    return target()

  File "/usr/local/lib/python3.7/site-packages/google/api_core/timeout.py", line 214, in func_with_timeout

    return func(*args, **kwargs)

  File "/usr/local/lib/python3.7/site-packages/google/api_core/grpc_helpers.py", line 69, in error_remapped_callable

    six.raise_from(exceptions.from_grpc_error(exc), exc)

  File "<string>", line 3, in raise_from

google.api_core.exceptions.DeadlineExceeded: 504 Deadline Exceeded
jeetpatel9 commented 2 years ago

@vi3k6i5 Could please help me understand if am I doing something wrong or its just spanner.

IlyaFaer commented 2 years ago

@jeetpatel9, looking at this line This method is non-operational in autocommit mode return self.connection.commit(), I suspect migration is executed in autocommit mode, so every DDL statement is done separately. It's not a good practise due to performance issues. We usually recommend to use batches.

As I see, using any transactions API will not work here, as Spanner DDLs are working outside transactions.

@vi3k6i5, looks like our backend needs SchemaEditor class override to set that migration should be done in non-autocommit mode. With this, all the DDL statements will be pumping up on the connection and executed in a batch on commit(). It should fix the problem.

vi3k6i5 commented 2 years ago

Thanks @IlyaFaer : Will look into doing that, Thanks @jeetpatel9 from bring it to attention.

For now a solution that I can recommend is running the migration and the spanner instance on GCP. Running Django app on local and running spanner on GCP significantly adds to network delays causing issues.

Also try tuning CONN_MAX_AGE as per your requirement, which will reduce the number of db connection creations.

I will investigate Illya's suggestion more and get back with a solution.

vi3k6i5 commented 2 years ago

@jeetpatel9 : can you try with these settings. AUTOCOMMIT = False CONN_MAX_AGE = 60 I am able to run migrations with the polls Django app location : local (India) to Spanner (us-central1), total migration time 10 mins.

NOTE: For the right value for CONN_MAX_AGE do try with your application what value works best.

@IlyaFaer : I did not see any major gains in performance with AUTOCOMMIT=False or AUTOCOMMIT=True, Django runs a lot of migrations and they take time to complete on spanner.

jeetpatel9 commented 2 years ago

@vi3k6i5 I tried with AUTOCOMMIT = False & CONN_MAX_AGE = 60. It worked and took around 7 hours to finish.

I also ran the migrations as a job in Kubernetes cluster which is in the same region as the spanner instance and it still took 7 hours and 15 mins to finish migrating 42 tables.

NOTE: I deleted all the previous migration files and created new one to reduce the number of queries.

The only difference is if AUTOCOMMIT is set to FALSE the migrations does not fail half way.

vi3k6i5 commented 2 years ago

Would it be possible to share the table structure which is slowing the progress?

Also try a basic empty django app see if that is also slow for you, this generally wraps up in under 10 mins for me.

jeetpatel9 commented 2 years ago

Generally each table is taking significant amount of time but the longest is when the indexes of the tables are being created.

I tried a basic Django app with 4 tables and no indexes at all. It took around 15 mins.

Also, in most cases I get this DeadlineExceeded issue and the migrations are rarely successful. The last time I ran the migrations, this error was thrown while applying below shown migration. Also it doesn't break every time at exactly this point, it throws this error usually after applying 10-14 migrations.

# Generated by Django 3.2.12 on 2022-02-26 14:58

from django.db import migrations, models
import django_spanner

class Migration(migrations.Migration):

    initial = True

    dependencies = [
    ]

    operations = [
        migrations.CreateModel(
            name='ExpiredOrders',
            fields=[
                ('id', models.BigAutoField(auto_created=True, default=django_spanner.gen_rand_int64, primary_key=True, serialize=False, verbose_name='ID')),
                ('created_at', models.DateTimeField(auto_now_add=True)),
                ('modified_at', models.DateTimeField(auto_now=True)),
                ('identifier', models.BigIntegerField()),
                ('is_older', models.BooleanField(default=False)),
                ('is_out_of_flu_range', models.BooleanField(default=False)),
            ],
            options={
                'abstract': False,
            },
        ),
        migrations.CreateModel(
            name='Orders',
            fields=[
                ('id', models.BigAutoField(auto_created=True, default=django_spanner.gen_rand_int64, primary_key=True, serialize=False, verbose_name='ID')),
                ('created_at', models.DateTimeField(auto_now_add=True)),
                ('modified_at', models.DateTimeField(auto_now=True)),
                ('units', models.IntegerField()),
                ('buy_price_per_unit', models.FloatField(blank=True, null=True)),
                ('sell_price_per_unit', models.FloatField(blank=True, null=True)),
                ('is_market_order', models.BooleanField(default=False)),
                ('is_crowd_funding', models.BooleanField(default=False)),
                ('cf_group', models.IntegerField(blank=True, null=True)),
                ('fluctuation_upper_limit', models.FloatField()),
                ('fluctuation_lower_limit', models.FloatField()),
                ('order_trade_fee', models.FloatField()),
                ('tax_percent', models.FloatField()),
                ('sold_units', models.IntegerField(default=0)),
                ('bought_units', models.IntegerField(default=0)),
            ],
        ),
        migrations.CreateModel(
            name='OrderStatus',
            fields=[
                ('id', models.BigAutoField(auto_created=True, default=django_spanner.gen_rand_int64, primary_key=True, serialize=False, verbose_name='ID')),
                ('name', models.CharField(max_length=20)),
            ],
        ),
        migrations.CreateModel(
            name='OrderType',
            fields=[
                ('id', models.BigAutoField(auto_created=True, default=django_spanner.gen_rand_int64, primary_key=True, serialize=False, verbose_name='ID')),
                ('name', models.CharField(max_length=20)),
            ],
        ),
    ]
crash-absolutelabs commented 2 years ago

I'm actually seeing the same issue on a brand new deployment. Sorry to bring this old-ish thread back to life