Closed Cloves23 closed 5 months ago
If the schema is hardcoded in the settings I think I'd much prefer if instead of using regex to search options we introduced a new setting which would simply say what the schema should be?
Is just a theory and requires testing.
Depending on how to define the schemas in the settings file, the same issue will happen when is using multiple databases with different schemas. If define the variable as a string, is for sure that the same problem will occur. Define as a dictionary solve the problem, but creates verbosity.
The advantage of the doing this in the settings is the control we will have. However, we are more susceptible to errors due the necessity to change in two places the same configuration instead of one.
What I imagined using the settings:
DATABASES = {
"default": {
# ...
"OPTIONS": {"options": "-c search_path=custom_schema_name"},
},
"db2": {
# ...
"OPTIONS": {"options": "-c search_path=another_schema_name"},
},
}
# Will fail when is using the database "db2"
PG_VIEWS_SCHEMA = "custom_schema_name"
# May be a solution, but creates verbosity.
PG_VIEWS_SCHEMAS = {
"default": "custom_schema_name",
"db2": "another_schema_name",
}
On the other hand, I've no guarantee that the my solution will work using multiple databases with custom routers. It seems more complex and can leads to unexpected issues.
In short, I prefer the automation, but your approach is simplest and also a suitable solution for me.
PS.: The purpose of this text was to give us an overview about the problem.
Based on that, which solution do you consider most suitable? We can use the settings solution using the dictionary to use it in _schema_and_name
function.
A second option is to create two variables. One defining the default schema and a second to define the schema to each database.
DATABASES = {
"default": {
# ...
"OPTIONS": {"options": "-c search_path=custom_schema_name"},
},
"db1": {
# ...
"OPTIONS": {"options": "-c search_path=custom_schema_name"},
},
"db2": {
# ...
"OPTIONS": {"options": "-c search_path=db2_schema"},
},
# ...
}
PG_VIEWS_DEFAULT_SCHEMA = "custom_schema_name"
PG_VIEWS_SCHEMAS = {
"db2": "db2_schema",
# ...
}
Hm I quite like the variation of having two settings variables (PG_VIEWS_DEFAULT_SCHEMA
would be public
by default, PG_VIEWS_SCHEMAS
would be an empty dictionary by default), somehow seems cleaner than having a regex or an extra query to get a schema in each call of _schema_and_name
.
I think especially since these are two things that are not particularly likely to happen (having multiple databases, and defining schemas in OPTIONS), and the combination is even less likely, I would like not to create too much overhead in accounting for rare cases.
Yes, having more settings does create some verbosity and some duplication, but I think it's a case of set and forget, that you need to just get it right once on setup and then never worry about it again.
So, do we both agreed to use this two settings variables? Which naming standard do you want to use? Or are these names good for you?
I'm good with PG_VIEWS_DEFAULT_SCHEMA and PG_VIEWS_SCHEMAS
Hello, @mikicz.
I'm having trouble configuring the project. Basically the views creation is not respecting the app name when try to apply the migration of an specific app.
Example:
$ python manage.py migrate auth
Result:
Operations to perform:
Apply all migrations: auth
Running migrations:
Applying contenttypes.0001_initial... OK
Applying contenttypes.0002_remove_content_type_name... OK
Applying auth.0001_initial... OK
Applying auth.0002_alter_permission_name_max_length... OK
Applying auth.0003_alter_user_email_max_length... OK
Applying auth.0004_alter_user_username_opts... OK
Applying auth.0005_alter_user_last_login_null... OK
Applying auth.0006_require_contenttypes_0002... OK
Applying auth.0007_alter_validators_add_error_messages... OK
Applying auth.0008_alter_user_username_max_length... OK
Applying auth.0009_alter_user_last_name_max_length... OK
Applying auth.0010_alter_group_name_max_length... OK
Applying auth.0011_update_proxy_permissions... OK
Applying auth.0012_alter_user_first_name_max_length... OK
All applications have migrated, time to sync
pgview viewtest.Superusers created
pgview viewtest.LatestSuperusers created
pgview viewtest.SimpleUser created
Traceback (most recent call last):
...
File ".../django-pgviews/django_pgviews/models.py", line 81, in run_backlog
status = create_view(
File "/usr/local/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, **kwds)
File ".../django-pgviews/django_pgviews/view.py", line 297, in create_view
cursor.execute("CREATE OR REPLACE VIEW {0} AS {1};".format(view_name, view_query.query), view_query.params)
psycopg2.errors.UndefinedTable: relation "viewtest_testmodel" does not exist
LINE 1: ...est_relatedview AS SELECT id AS model_id, id FROM viewtest_t...
^
I was trying to configure the app first to execute the tests. So I was thinking if this could be this another bug.
So looking at those migrations and that error I think the issue is that the table for the model TestModel
does not exist in the database yet - the migrations didn't run for it, so the materialized view can't use it. It's either because the migration for the model doesn't exist or it's because you're running a subselection of migrations (in the command you have auth
, but contenttypes
ran as well, so not sure what is actually happening
My best guess is that app is just verifying if the migration process has bee executed by all apps that have a models.py
file. This means that the pgviews will try to create the views every time, even if the target app is not the app that have the view. This approach is not the good because creates a limitation.
A possible solution is try to move the views creation to the migrations file instead of use the post_migrate
signal. The update trigger keeps it as it is.
To do this we need to define a few things:
RunSQL
. Using a queryset or a pure string will be compatible. The advantage of this is that we will keep the change history;The reason I said about to use queryset is because I'm defining the SQL this way:
class City(ReadOnlyMaterializedView):
# ...
@classmethod
def view_queryset(cls):
return (
Territory.objects.only('id', 'code', 'name', 'is_enabled', 'country', 'administrative_division')
.annotate(
microregion_id=models.F('parent_territory_id'),
mesoregion_id=models.F('parent_territory__parent_territory_id'),
federative_unit_id=models.F('parent_territory__parent_territory__parent_territory_id'),
region_id=models.F('parent_territory__parent_territory__parent_territory__parent_territory_id'),
)
.filter(
administrative_division_id=cls.ADMINISTRATIVE_DIVISION.CITY,
parent_territory__parent_territory__parent_territory__administrative_division_id=(
cls.ADMINISTRATIVE_DIVISION.FEDERATIVE_UNIT
),
parent_territory__parent_territory__parent_territory__parent_territory__administrative_division_id=(
cls.ADMINISTRATIVE_DIVISION.REGION
),
)
)
@classmethod
@property
@lru_cache
def sql(cls):
return str(cls.view_queryset().query)
My best guess is that app is just verifying if the migration process has bee executed by all apps that have a models.py file. This means that the pgviews will try to create the views every time, even if the target app is not the app that have the view.
Not really, the app doesn't interact with migrations in any way except for using the post_migrate
hook.
This approach is not the good because creates a limitation.
Yes, but not massive one though I think - you want to be running all migrations in most cases anyway.
If you run migrations for specific apps often, you can override the ViewConfig
app config to override the ready
method, to for example not run the signal if an environment variable is set.
A possible solution is try to move the views creation to the migrations file instead of use the post_migrate signal.
I do agree that if I were writing this library from scratch I would probably try to write it like that, but currently you would basically need to rewrite most of this library, which I will not be doing. There's also stuff this library supports which probably would be very hard to do using migrations.
The reason I said about to use queryset is because I'm defining the SQL this way:
Neat - does it work?
Fixed in #28, release 0.10.0
Hello.
When the schema name is defined using the
OPTIONS
keyword in the database settings, the sql that verifies if the view exists for a specific schema returns always 0 (sql here) because the function_schema_and_name
returnspublic
instead of the right schema name.Environment
How to reproduce
OPTIONS
keyword as following:Execute
python manage.py sync_pgviews
orpython manage.py migrate
. Is expected an traceback like this:Fix suggestion
This is only a suggestion and I can create a pull request if you like.
django_pgviews/view.py