xelixdev / django-pgviews-redux

Fork of django-postgres that focuses on maintaining and improving support for Postgres SQL Views.
The Unlicense
64 stars 15 forks source link

Error when define the schema name using the "search_path" option in database settings #18

Closed Cloves23 closed 5 months ago

Cloves23 commented 1 year ago

Hello.

When the schema name is defined using the OPTIONS keyword in the database settings, the sql that verifies if the view exists for a specific schema returns always 0 (sql here) because the function _schema_and_name returns public instead of the right schema name.

Environment

How to reproduce

Fix suggestion

This is only a suggestion and I can create a pull request if you like.

django_pgviews/view.py

# ...
RE_DB_SEARCH_PATH = re.compile(r'search_path=(?P<s>[\'"]?)(?P<sp>\w+)(?P=s)')
# ...

def _schema_and_name(connection, view_name):
    if "." in view_name:
        return view_name.split(".", 1)

    try:
        schema_name = connection.schema_name
    except AttributeError:
        search_path = RE_DB_SEARCH_PATH.search(connection.settings_dict.get('OPTIONS', {}).get('options', ''))
        if search_path:
            schema_name = search_path.group('sp')
        else:
            with connection.cursor() as cursor:
                cursor.execute("SELECT current_schema;")
                schema_name = cursor.fetchone()[0]
    return schema_name, view_name
mikicz commented 1 year ago

If the schema is hardcoded in the settings I think I'd much prefer if instead of using regex to search options we introduced a new setting which would simply say what the schema should be?

Cloves23 commented 1 year ago

Is just a theory and requires testing.

Depending on how to define the schemas in the settings file, the same issue will happen when is using multiple databases with different schemas. If define the variable as a string, is for sure that the same problem will occur. Define as a dictionary solve the problem, but creates verbosity.

The advantage of the doing this in the settings is the control we will have. However, we are more susceptible to errors due the necessity to change in two places the same configuration instead of one.

What I imagined using the settings:

DATABASES = {
    "default": {
        # ...
        "OPTIONS": {"options": "-c search_path=custom_schema_name"},
    },
    "db2": {
        # ...
        "OPTIONS": {"options": "-c search_path=another_schema_name"},
    },
}

# Will fail when is using the database "db2"
PG_VIEWS_SCHEMA = "custom_schema_name"

# May be a solution, but creates verbosity.
PG_VIEWS_SCHEMAS = {
    "default": "custom_schema_name",
    "db2": "another_schema_name",
}

On the other hand, I've no guarantee that the my solution will work using multiple databases with custom routers. It seems more complex and can leads to unexpected issues.

In short, I prefer the automation, but your approach is simplest and also a suitable solution for me.

PS.: The purpose of this text was to give us an overview about the problem.

Next step

Based on that, which solution do you consider most suitable? We can use the settings solution using the dictionary to use it in _schema_and_name function.

A second option is to create two variables. One defining the default schema and a second to define the schema to each database.

DATABASES = {
    "default": {
        # ...
        "OPTIONS": {"options": "-c search_path=custom_schema_name"},
    },
    "db1": {
        # ...
        "OPTIONS": {"options": "-c search_path=custom_schema_name"},
    },
    "db2": {
        # ...
        "OPTIONS": {"options": "-c search_path=db2_schema"},
    },
    # ...
}
PG_VIEWS_DEFAULT_SCHEMA = "custom_schema_name"
PG_VIEWS_SCHEMAS = {
    "db2": "db2_schema",
    # ...
}
mikicz commented 1 year ago

Hm I quite like the variation of having two settings variables (PG_VIEWS_DEFAULT_SCHEMA would be public by default, PG_VIEWS_SCHEMAS would be an empty dictionary by default), somehow seems cleaner than having a regex or an extra query to get a schema in each call of _schema_and_name.

I think especially since these are two things that are not particularly likely to happen (having multiple databases, and defining schemas in OPTIONS), and the combination is even less likely, I would like not to create too much overhead in accounting for rare cases.

Yes, having more settings does create some verbosity and some duplication, but I think it's a case of set and forget, that you need to just get it right once on setup and then never worry about it again.

Cloves23 commented 1 year ago

So, do we both agreed to use this two settings variables? Which naming standard do you want to use? Or are these names good for you?

mikicz commented 1 year ago

I'm good with PG_VIEWS_DEFAULT_SCHEMA and PG_VIEWS_SCHEMAS

Cloves23 commented 1 year ago

Hello, @mikicz.

I'm having trouble configuring the project. Basically the views creation is not respecting the app name when try to apply the migration of an specific app.

Example:

$ python manage.py migrate auth

Result:

Operations to perform:
  Apply all migrations: auth
Running migrations:
  Applying contenttypes.0001_initial... OK
  Applying contenttypes.0002_remove_content_type_name... OK
  Applying auth.0001_initial... OK
  Applying auth.0002_alter_permission_name_max_length... OK
  Applying auth.0003_alter_user_email_max_length... OK
  Applying auth.0004_alter_user_username_opts... OK
  Applying auth.0005_alter_user_last_login_null... OK
  Applying auth.0006_require_contenttypes_0002... OK
  Applying auth.0007_alter_validators_add_error_messages... OK
  Applying auth.0008_alter_user_username_max_length... OK
  Applying auth.0009_alter_user_last_name_max_length... OK
  Applying auth.0010_alter_group_name_max_length... OK
  Applying auth.0011_update_proxy_permissions... OK
  Applying auth.0012_alter_user_first_name_max_length... OK
All applications have migrated, time to sync
pgview viewtest.Superusers created
pgview viewtest.LatestSuperusers created
pgview viewtest.SimpleUser created
Traceback (most recent call last):
  ...
  File ".../django-pgviews/django_pgviews/models.py", line 81, in run_backlog
    status = create_view(
  File "/usr/local/lib/python3.10/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File ".../django-pgviews/django_pgviews/view.py", line 297, in create_view
    cursor.execute("CREATE OR REPLACE VIEW {0} AS {1};".format(view_name, view_query.query), view_query.params)
psycopg2.errors.UndefinedTable: relation "viewtest_testmodel" does not exist
LINE 1: ...est_relatedview AS SELECT id AS model_id, id FROM viewtest_t...
                                                             ^

I was trying to configure the app first to execute the tests. So I was thinking if this could be this another bug.

mikicz commented 1 year ago

So looking at those migrations and that error I think the issue is that the table for the model TestModel does not exist in the database yet - the migrations didn't run for it, so the materialized view can't use it. It's either because the migration for the model doesn't exist or it's because you're running a subselection of migrations (in the command you have auth, but contenttypes ran as well, so not sure what is actually happening

Cloves23 commented 1 year ago

My best guess is that app is just verifying if the migration process has bee executed by all apps that have a models.py file. This means that the pgviews will try to create the views every time, even if the target app is not the app that have the view. This approach is not the good because creates a limitation.

A possible solution is try to move the views creation to the migrations file instead of use the post_migrate signal. The update trigger keeps it as it is.

To do this we need to define a few things:

Cloves23 commented 1 year ago

The reason I said about to use queryset is because I'm defining the SQL this way:

class City(ReadOnlyMaterializedView):
    # ...
    @classmethod
    def view_queryset(cls):
        return (
            Territory.objects.only('id', 'code', 'name', 'is_enabled', 'country', 'administrative_division')
            .annotate(
                microregion_id=models.F('parent_territory_id'),
                mesoregion_id=models.F('parent_territory__parent_territory_id'),
                federative_unit_id=models.F('parent_territory__parent_territory__parent_territory_id'),
                region_id=models.F('parent_territory__parent_territory__parent_territory__parent_territory_id'),
            )
            .filter(
                administrative_division_id=cls.ADMINISTRATIVE_DIVISION.CITY,
                parent_territory__parent_territory__parent_territory__administrative_division_id=(
                    cls.ADMINISTRATIVE_DIVISION.FEDERATIVE_UNIT
                ),
                parent_territory__parent_territory__parent_territory__parent_territory__administrative_division_id=(
                    cls.ADMINISTRATIVE_DIVISION.REGION
                ),
            )
        )

    @classmethod
    @property
    @lru_cache
    def sql(cls):
        return str(cls.view_queryset().query)
mikicz commented 1 year ago

My best guess is that app is just verifying if the migration process has bee executed by all apps that have a models.py file. This means that the pgviews will try to create the views every time, even if the target app is not the app that have the view.

Not really, the app doesn't interact with migrations in any way except for using the post_migrate hook.

This approach is not the good because creates a limitation.

Yes, but not massive one though I think - you want to be running all migrations in most cases anyway. If you run migrations for specific apps often, you can override the ViewConfig app config to override the ready method, to for example not run the signal if an environment variable is set.

A possible solution is try to move the views creation to the migrations file instead of use the post_migrate signal.

I do agree that if I were writing this library from scratch I would probably try to write it like that, but currently you would basically need to rewrite most of this library, which I will not be doing. There's also stuff this library supports which probably would be very hard to do using migrations.

The reason I said about to use queryset is because I'm defining the SQL this way:

Neat - does it work?

mikicz commented 5 months ago

Fixed in #28, release 0.10.0