pytest-dev / pytest-django

A Django plugin for pytest.
https://pytest-django.readthedocs.io/
Other
1.32k stars 341 forks source link

Mutl databases (multi db) support #924

Open bluetech opened 3 years ago

bluetech commented 3 years ago

This issue replaces some historical issues: #76, #342, #423, #461, #828, #838, #839 (probably a partial list).

Background

Django supports multi databases. This means defining multiple entries in the DATABASE setting, which then allows directly certain queries to certain databases.

One case is when an extra database is entirely independent, has its own migrations, setups etc.

Second case is when an extra database is readonly, only used for read queries, not managed by Django.

Third case is a readonly replica, for this Django provides the MIRROR setting

Django allows configuring the order in which test databases are set up.

Django's multi-db testing support

pytest-django mostly relies on Django's underlying TransactionTestCase and TestCase classes for dealing with DB setups and such. Each pytest-django test gets run in a dynamically-generated TestCase/TransactionTestCase.

The main setting for mutli-db support is TransactionTestCase.databases. This tells Django which databases to consider for the test case. By default it's only default. It's possible to specify __all__ to include all databases.

Historical note: The TransactionTestCase.databases attribute was added in Django 2.2. Before that a multi_db attribute was used. pytest-django only supports Django>=2.2 so we happily don't need to concern ourselves with that.

Previous attempts

397 - Adds multi_db=True argument to pytest.mark.django_db(), adds django_multi_db fixture. Problem: uses the old multi_db attribute instead of the databases attribute.

416 - Very similar to #397.

431 - Adds django_db_testcase fixture which allows the user to completely customize the test case class, including setting databases. Rejected for being too flexible, I'd prefer direct support for multi-db.

896 - Adds a global per-database setting for whether to add to the databases value or not. Rejected because I think it should be possible to customize per-test.

Proposed solution

IMO we want something like #397/#416, but modernized to use databases instead of multi_db. The fixture part would be a bit problematic because it's no longer just a boolean (fixture enabled/not enabled), but a list of database aliases. So some solution would be needed for that, or maybe only the mark would be supported.

I'll try to work on it myself, but if for some reason I don't, PRs are definitely welcome!

jgb commented 3 years ago

Following this! It's a showstopper for us, preventing updates beyond Django 3.0. We currently have an internal monkey patching workaround that works with Django <= 3.0 but I can't get it to work when they remove the multi_db parameter.

bluetech commented 3 years ago

Initial PR in #930.

pmourlanne commented 3 years ago

@jgb https://github.com/pytest-dev/pytest-django/pull/397 was developed for your exact use case back in the day :) It is outdated now :zoidberg:

jgb commented 3 years ago

@bluetech I'm testing 4.3.0 specifically for the multi db support. It seems to work, kind of. At least under django 3.0 it works. However as soon as I upgrade to django 3.1 or 3.2, it goes wrong, because somehow my data isn't getting flushed anymore after each test? Actually it goes wrong with all versions of django: 3.0, 3.1 and 3.2. Like I run a test which creates an object in the database, and when I run the test again the object of the previous run still exists. Any idea what might be going wrong here?

It seems like this old issue describes what I'm seeing: https://github.com/pytest-dev/pytest-django/issues/76 4.3.0 does allow me to access multiple db's, but it doesn't properly clean / flush them in between test runs.

jcushman commented 3 years ago

Awesome! Thanks for working on this, it looks great.

I converted a multidb project over to the experimental API and it seems to be working, including flushing data between test runs. (I was previously using the workaround of TransactionTestCase.databases = TestCase.databases = set(settings.DATABASES.keys()) in a session-scoped fixture.)

It's also working correctly with pytest-xdist, which is very cool.

The fixture part would be a bit problematic because it's no longer just a boolean (fixture enabled/not enabled), but a list of database aliases. So some solution would be needed for that, or maybe only the mark would be supported.

FWIW I definitely struggled with the lack of a fixture version. Just to spell out the issue you're talking about, with a single-db project I would do something like this:

@pytest.fixture
def person(db):
    return Person.objects.create(...)

# no need for pytest.mark.django_db:
def test_person(person):
    ...

With multi-db I would imagine wanting to do something like this:

@pytest.fixture
def db_people(add_django_db):
    add_django_db('people')

@pytest.fixture
def db_books(add_django_db):
    add_django_db('books')

@pytest.fixture
def person(db_people):
    return Person.objects.create(...)

@pytest.fixture
def book(db_books):
    return Book.objects.create(...)

# no need for @pytest.mark.django_db(databases=['people', 'books'])
def test_reader(person, book):
    ...

(Not sure if that's possible to implement, but just as an example of how an API could work.)

This would be convenient for fixtures in general, because otherwise it's easy to forget to remove 'books' from databases when you edit which fixtures are used by test_reader later, and it's also nice just for removing a line of noise from each test. But it becomes particularly useful when using doctests:

def reader_stuff():
    """
    # no way to use pytest.mark here?
    >>> person = getfixture("person")
    >>> book = getfixture("book")
    ...
    """

The workaround I found to get my doctests working was to add an autouse fixture so db and transactional_db fixtures would by default load all databases, unless there's an explicit pytest.mark.django_db:

@pytest.fixture(autouse=True)
def database_defaults(request):
    # set default databases for `db` and `transactional_db` fixtures
    if not hasattr(request.node, '_pytest_django_databases'):
        request.node._pytest_django_databases = list(settings.DATABASES.keys())

That's a pretty handy thing to be able to opt into, so it might be nice to have a more official way to do it? But it still leaves the doctests less efficient than they could be otherwise, since they don't actually need access to all the databases, so I think a fixture version would still be useful.

One other idea I had while doing the conversion -- it would be cool if there was some flag I could use to be warned about unused databases, so if I removed the book fixture from test_reader I'd be warned that 'books' was no longer needed in the annotation. Not sure if that's possible to track, just a random brainstorm in case there's some handy way to implement it.

Thanks again for pushing this forward!

jgb commented 3 years ago

@bluetech so why does the flushing between each test work for @jcushman but not for me? :question:

bluetech commented 3 years ago

@jgb @jcushman Thanks for the reports! I'll be looking at this again this weekend and I'll reply then.

jcushman commented 3 years ago

@jgb here's what I'm using successfully in case it helps.

Packages:

Python 3.7.10
Postgres 11.11
Django==3.2.3
pytest==6.0.1
pytest-django==4.3.0
pytest-xdist==1.32.0

Databases:

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql_psycopg2',
        ...
    },
    'capdb': {
        'ENGINE': 'django.db.backends.postgresql_psycopg2',
        ...
    }

I started with adding just a simple test file before converting my actual tests:

@pytest.fixture
def court():
    return Court.objects.create(name='foo')  # comes from capdb

@pytest.fixture
def mailing_list():
    return MailingList.objects.create(email='foo@example.com')  # comes from default

@pytest.mark.parametrize("x", range(10))
@pytest.mark.django_db(databases=['default', 'capdb'])
def test_multi(x, court, mailing_list):
    assert Court.objects.count() == 1
    assert MailingList.objects.count() == 1

Results:

# pytest capdb/tests/test_multi.py
========================================= test session starts =========================================
platform linux -- Python 3.7.10, pytest-6.0.1, py-1.10.0, pluggy-0.13.1
django: settings: config.settings.settings_pytest (from ini)
rootdir: /app, configfile: setup.cfg
plugins: django-4.3.0, forked-1.0.1, flaky-3.6.0, celery-4.3.0, cov-2.9.0, redis-2.0.0, xdist-1.32.0
collected 10 items

capdb/tests/test_multi.py ..........                                                            [100%]

========================================= 10 passed in 4.28s ==========================================

I imagine if you can try an isolated test like that and narrow down the issue it might help bluetech when they get back to working on this. You might also look closely at your fixtures, maybe entirely disable them, and see if the isolated test starts working again -- it wouldn't be that surprising if an old multidb workaround in your fixtures is messing with this.

gonzaloamadio commented 3 years ago

Does this mean that if I have a multi-db project, I can not use pytest for tests?

I have a legacy DB and I created an app with some models that correlates with tables in that legacy DB, and also I have created some endpoints with Django DRF (managed=False), so no migrations are done. So basically would be case 1 of first comment by bluetech.

bluetech commented 3 years ago

@jcushman

FWIW I definitely struggled with the lack of a fixture version

Yes, I'm pretty sure we need some integration with fixtures here.

Your suggestion should be doable; all we really need is to know the list of databases once the test is to be executed.

I think the add_django_db API is not too great, but maybe I need to get used to it a bit.

I'll definitely mull it over. Of course if someone wants to submit a PR with a proposal that would be possible as well.

# no way to use pytest.mark here?

Right, currently there is no way to add marks directly to doctests. This is https://github.com/pytest-dev/pytest/issues/5794. I can't think of any clear way to support it either.

One other idea I had while doing the conversion -- it would be cool if there was some flag I could use to be warned about unused databases

For the multi-db support, pytest-django depends almost entirely on the django.test code, so such a feature would probably have to go through Django. It might be possible to somehow track whether a connection for a database was used during a test, but I'm not sure. There are also bound to be a lot of false-positives (or rather, cases where you want to keep a DB anyway), so would definitely need to be off by default.

bluetech commented 3 years ago

@jgb

My answer is pretty much what @jcushman said (thanks!) -- it works here, so we'll need more details to help us narrow down the cause.

bluetech commented 3 years ago

@gonzaloamadio

Does this mean that if I have a multi-db project, I can not use pytest for tests?

It's supposed to be the other way around - previously you couldn't, now you can.

If you configured the legacy database in your DATABASES then it should work. If you tried it and it didn't work, we'd need to know how it failed.

gonzaloamadio commented 3 years ago

Hi, @bluetech I have made a reusable app (let's call it API) that has this models unmanaged models. Then (in same repo) another "testapp" that install this API

In this testapp/settings.py I have 2 databases. One is a legacy db, the one that will be queried by models unmanaged models in API app.

DATABASES = {
    'default': {
        'ATOMIC_REQUESTS': True,
        'ENGINE': 'django.db.backends.sqlite3',
        'NAME': os.path.join(str(BASE_DIR), 'db.sqlite3'),
         . . . 
    },
    'legacy_db': {
        'ENGINE': django.db.backends.postgresql,
        . . . 
    }
}
DATABASE_ROUTERS = ['my-project.testapp.database_routers.DatabaseRouter']

And then I have also a testapp/settings_test.py. In this file, I disable migrations, define sqlite databases and set managed=True for models.

from .settings import *
from django.test.runner import DiscoverRunner

class DisableMigrations(object):

    def __contains__(self, item):
        return True
    def __getitem__(self, item):
        return

MIGRATION_MODULES = DisableMigrations()

class UnManagedModelTestRunner(DiscoverRunner):
    """
    Test runner that automatically makes all unmanaged models in your Django
    project managed for the duration of the test run.
    """

    def setup_test_environment(self, *args, **kwargs):
        from django.apps import apps

        self.unmanaged_models = [
            m for m in apps.get_models() if not m._meta.managed
        ]
        for m in self.unmanaged_models:
            m._meta.managed = True
        super(UnManagedModelTestRunner, self).setup_test_environment(
            *args, **kwargs
        )

    def teardown_test_environment(self, *args, **kwargs):
        super(UnManagedModelTestRunner, self).teardown_test_environment(
            *args, **kwargs
        )
        # reset unmanaged models
        for m in self.unmanaged_models:
            m._meta.managed = False

DATABASES = {
    "default": {
        "ENGINE": "django.db.backends.sqlite3",
        "NAME": os.path.join(BASE_DIR, "db_test.sqlite3"),
        "TEST": {
            "ENGINE": "django.db.backends.sqlite3",
            "NAME": os.path.join(BASE_DIR, "db_test.sqlite3"),
        },
    },
    'legacy_db': {
        "ENGINE": "django.db.backends.sqlite3",
        "NAME": os.path.join(BASE_DIR, "db_legacy_test.sqlite3"),
        "TEST": {
            "ENGINE": "django.db.backends.sqlite3",
            "NAME": os.path.join(BASE_DIR, "db_legacy_test.sqlite3"),
        },
    },
}

# Set Django's test runner to the custom class defined above
TEST_RUNNER = "testapp.settings_test.UnManagedModelTestRunner"

If I run normal unit tests with "production" settings, it fails as expected failing because relations for unmanaged models does not exists. ./manage.py test --settings=testapp.settings

If I run using test settings , it work as expected. ./manage.py test --settings=testapp.settings_test

BUT, if I run using pytest.

    def execute(self, query, params=None):
        if params is None:
            return Database.Cursor.execute(self, query)
        query = self.convert_query(query)
>       return Database.Cursor.execute(self, query, params)
E       django.db.utils.OperationalError: no such table: ingredient

Here is a gist with more code (models, factory, test, settings): https://gist.github.com/gonzaloamadio/14f935d96809299b7f1e9fb88a6e8e94

I put a breakpoint and have inspected DB. And also as expected, when I run with unittest suite, the ingredient table was there?

image

But when run with pytest.. no ingredient table there

image

gonzaloamadio commented 3 years ago

One more comment. I have run unittest with verbose option. This is the output

❯ ./manage.py test --settings=testapp.settings_test --verbosity=2

Using existing test database for alias 'default' ('db_test.sqlite3')...
Operations to perform:
  Synchronize unmigrated apps: account, admin, core_api, auth, contenttypes, corsheaders, django_extensions, django_filters, messages, rest_framework, runserver_nostatic, sessions, softdelete, staticfiles, test_without_migrations
  Apply all migrations: (none)

Synchronizing apps without migrations:
  Creating tables...
    Creating table auth_permission
    Creating table auth_group
     . . .  more of django core stuff tables
Running migrations:
  No migrations to apply.

Using existing test database for alias 'legacy_db' ('db_legacy_test.sqlite3')...
Operations to perform:
  Synchronize unmigrated apps: account, admin, core_api, auth, contenttypes, corsheaders, django_extensions, django_filters, messages, rest_framework, runserver_nostatic, sessions, softdelete, staticfiles, test_without_migrations
  Apply all migrations: (none)
Synchronizing apps without migrations:
  Creating tables...
    Creating table ingredient                   <--- This is the key.  
    Running deferred SQL...
Running migrations:
  No migrations to apply.

So I found this solution : https://stackoverflow.com/questions/30973481/django-test-tables-are-not-being-created/50849037#50849037

Basically do in conftest what UnManagedModelTestRunner is doing.

This solution worked for me @bluetech

$ cat conftest.py

import pytest
# Set managed=True for unmanaged models. !! Without this, tests will fail because tables won't be created in test db !!
@pytest.fixture(autouse=True, scope="session")
def __django_test_environment(django_test_environment):
    from django.apps import apps

    get_models = apps.get_models

    for m in [m for m in get_models() if not m._meta.managed]:
        m._meta.managed = True
bluetech commented 3 years ago

@gonzaloamadio Right, pytest doesn't consider TEST_RUNNER (it is itself the test runner), so your tweaks are not affecting it. I'm not sure if you were eventually able to do the modification in the conftest.py or not. If not, let me know and I'll try to help.

bepuca commented 3 years ago

So I just stumbled upon this section on the docs. We are currently upgrading a multi DB Django project from Django 1.11 to Django 3.2 and also upgrading the pytest and pytest-django packages. I was not aware of all these changes, but for us it worked out of the box without any issues. The tests are passing without problems. So thank you for that!

AnderUstarroz commented 2 years ago

Multi DB is supported then? I am facing a strange issue when trying to execute queries using cursor within multiple DBs:

Am I doing a bad use of the fixtures?

peterschwarzdev commented 2 years ago

Same problem here! Django DB fixtures don't work with the cursors generated by:

from django.db import connections

???

mschoettle commented 2 years ago

We ran into a problem when we added a second database to the Django settings. When running pytest it tried to create a database with a second test_ prefix (test_test_<dbname>) for the default database. It did only happen to a few other developers so I could not reproduce it.

We were able to fix it by specifying a specific database name for the test database in the settings (in DATABASES['default']['TEST']['NAME']).

AnderUstarroz commented 1 year ago

What about accessing a non-default DB within a fixture, how can we do that?

I know about the db fixture, which allows access just to the default DB:

@pytest.fixture()
def db_access(db):
    # Here you can access just the "default" db.
raianul commented 1 year ago

What I did so far - in confest.py


TestCase.databases = {"default", "replica"}

However its creating both test database but all of my fixture is executing for default, not in the replica. Ultimately my tests failed.

Here is my fixture -

@pytest.fixture(name="test_factory")
def fixture_test_factory():

    def _test(**kwargs):
        return TestModel(name="test", **kwargs)

    return _test

Do I need to make any other changes?

bobflorian commented 1 year ago

solid multi db support would a huge win for us. we have multi tenant Django app touching a bunch of databases and ancillary databases as well

erkandmp commented 1 year ago

Hey guys, thanks to all for all of your work in this project!

I found my way to this thread while I was upgrading some packages and running the test suit:

AssertionError: Database connections to 'xxxx_db' are not allowed in this test. 
Add 'xxxx_db' to pytest_django.fixtures._django_db_fixture_helper.<locals>.PytestDjangoTestCase.databases 
to ensure proper test isolation and silence this failure.

suggestion If there could be a way to configure globally that access to the non-default database is ok, it would help us a lot. All of the tests are marked as "talks to the database"(ie. db fed as an argument fixture or @pytest.mark.django_db) and trying to get to pytest-django>=4.4.3 would require to refactor multiple thousands of tests for us. 😅

For now I pinned pytest-django==4.2.0.

Thanks again and have a great time!

dannyzhong commented 1 year ago

my case is even more complicated, I have two databases, one in MySQL, one in Postgresql, not sure how to pytest them

christianbundy commented 1 year ago

Is enable_db_access_for_all_tests meant to support multiple databases? Here's what I'm using:

@pytest.fixture(autouse=True)
def enable_db_access_for_all_tests(db):
    pass
christianbundy commented 1 year ago

I'm unsure how useful this will be for others, but if you're transitioning from a codebase where all of your tests inherit from a standard test class where you set `databases = "all", or if you use that pattern often, you should know:

lachlancannon commented 1 year ago

Is the current support meant to include support for the MIRROR setting? Everything except that was working for me, and I also couldn't write the test data directly to the second database without permission errors, e.g. User.objects.using('reporting').create(email='whatever@wherever.com')

SHxKM commented 1 year ago

Is there any way where I can apply:

@pytest.mark.django_db(databases=['default'])

To all tests? I have hundreds of tests that are using the db using the "magic" db argument to the test:

def test_profile(db):
 ...

Edit: Even after removing the 2nd DB from my dev.py settings file and even removing references to using(...) pytest is failing. Where is it getting it's idea about the 2nd DB from now?

smohsenmohseni commented 1 year ago

Is the current support meant to include support for the MIRROR setting? Everything except that was working for me, and I also couldn't write the test data directly to the second database without permission errors, e.g. User.objects.using('reporting').create(email='whatever@wherever.com')

i have same issue

pmaresca commented 10 months ago

Is it possible to know which db a test that specifies multiple db's is intended to be running against? i.e. I've got two db's and the test runs twice, does the test know which db this run is for?

julianmoji commented 7 months ago

Nice guys, Everything works, is there some way to avoid the database flushing??

hannylicious commented 6 months ago

Everything seems to be working for me! Great feature, IMO.

kkrasovskii commented 4 months ago

Hi there all!

Please, could anybody explain how to run a test for specific database from multi databases configuration?

I have defined three databases in settings.py: 'default', 'postgres1' and 'postgres2'. The first one is default Django sqlite3 DB. The second one with read-only access and the last one is with read-write access. I need to check that my model is created in 'postgres2' database with read-write access. So, I wrote the test:

import pytest
from my_app.models import MyModel

@pytest.mark.django_db(databases=['postgres2'])
def test_create_my_model():
    MyModel.objects.create()

If I run the test, I get an error saying that there is no access to 'postgres1' database with read-only access (yep, there is no connection to postgres1, but I expect it will not be used). What is wrong here?

Thanks in advance!

gonzaloamadio commented 4 months ago

Does this work? https://docs.djangoproject.com/en/2.2/topics/testing/tools/#django.test.TransactionTestCase.databases https://stackoverflow.com/questions/38307523/test-isolation-broken-with-multiple-databases-in-django-how-to-fix-it

Set databases variable inside the test case

El mié, 7 feb 2024 a las 14:51, kkrasovskii @.***>) escribió:

Hi there all!

Please, could anybody explain how to run a test for specific database from multi databases configuration?

I have defined three databases in settings.py: 'default', 'postgres1' and 'postgres2'. The first one is default Django sqlite3 DB. The second one with read-only access and the last one is with read-write access. I need to check that my model is created in 'postgres2' database with read-write access. So, I wrote the test:

import pytest from my_app.models import MyModel

@pytest.mark.django_db(databases=['postgres2']) def test_create_my_model(): MyModel.objects.create()

If I run the test, I get an error saying that there is no access to 'postgres1' database with read-only access. What is wrong here?

Thanks in advance!

— Reply to this email directly, view it on GitHub https://github.com/pytest-dev/pytest-django/issues/924#issuecomment-1932574950, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHVLY6YYAFCPS7IXR5UWULYSO5JDAVCNFSM44JQ4JGKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJTGI2TONBZGUYA . You are receiving this because you were mentioned.Message ID: @.***>

--

Gonzalo Amadio

mschoettle commented 4 months ago

@kkrasovskii I suspect that you need to tell the ORM which DB to use for this model. Either via a DB router (https://docs.djangoproject.com/en/dev/topics/db/multi-db/#automatic-database-routing) or manually (https://docs.djangoproject.com/en/dev/topics/db/multi-db/#manually-selecting-a-database).

Not sure why it is trying postgres1 instead of default though.

kkrasovskii commented 4 months ago

@gonzaloamadio, @mschoettle, than you so much for reply!

The problem doesn't seem to be routing. My apps work fine with the databases: the problem with tests.

django version 5.0.2, pytest-django version 4.8.0

As I've mentioned earlier, three databases described in settings.py: sqlite3 as 'default' and two postgres databases ('postgres1', 'postgres2'). When I run the test, there is no connection to 'postgres1'. I rewrote the test the way @gonzaloamadio suggested:

from django.test import TestCase
from my_app.models import MyModel

class MyTestCase(TestCase):
   databases = ['default', 'postgres2']

   def setUp(self):
       MyModel.objects.create()

   def test_something(self):
       pass

DB routing is done so that MyModel is written to base 'postgres2'.

If I run pytest command, I get an error that there is no connection to 'postgres1'. In the case I also get the warning:

RuntimeWarning: Normally Django will use a connection to the 'postgres' database to avoid running initialization queries against the production database when it's not needed (for example, when running tests). Django was unable to create a connection to the 'postgres' database and will use the first PostgreSQL database instead.

If I run the same command with 'postgres1' config removed from settings.py, the test is passing. If I run python manage.py test, the test is always passing (with and without 'postgres1' configuration in settings.py).

mschoettle commented 4 months ago

I am not entirely sure if this will fix it but we had some issues with the test databases and had to add the following to each database connection setting to force using the correct database in tests:

'TEST': {
            'NAME': f'test_{env("DATABASE_NAME")}',
        },
dferens commented 4 months ago

One more solution in #1113

lonetwin commented 1 month ago

Hi, I ended up here when searching for a solution to use pytest with a memory based sqlite db backend, for tests in an application where the production db is postgres, since the application is largely db engine agnostic.

The rationale being in-memory sqlite is blazing fast. The only hiccup being that a couple of my tests do require the postges db due to limitations in the sqlite engine.

I naively thought this might be sufficient:

# in tests/settings.py (which imports the application settings and overrides test specific config
...
DATABASES = {
    "default": {
        "ENGINE": "django.db.backends.sqlite3",
        "NAME": ":memory:?cache=shared",
    },
    "postgres": {
        "ENGINE": "django.db.backends.postgresql",
        "NAME": "...",
        ...
    },
}
...

# in tests that invoke queries not supported by sqlite

@pytest.mark.django_db(databases=["postgres"])
def test_complex_querying(db):
    ...

Unfortunately this results in

django.test.testcases.DatabaseOperationForbidden: Database queries to 'default' are not allowed in this test.
Add 'default' to pytest_django.fixtures._django_db_helper.<locals>.PytestDjangoTestCase.databases
 to ensure proper test isolation and silence this failure.

My assumption here is that this potentially is due to the fact that the model being queried (which actually is constructed from a FactoryBoy factory) is somehow registered with the default db.

Now before I go down the rabbit hole to attempting to address that is there something else that I should be aware about w.r.t pytest-django's db setup ? For instance, IIUC, the db setup is session scoped, which means that if I create a postges specific db fixture then I'd have to override/duplicate all of is being done when requesting the regular db fixture.

Is this correct ? Is there a simpler way that I am not aware of ?

TBH, the payoff for this "sqlite db with pg when required" is significant (integration test time down from 14m to 3m bar the 2 failing tests that fail due to the link above). So, I would really like to pursue this.

Edit: I just thought of an alternate approach of using pytest marks to mark some tests pg specific and then invoke pytest twice -- one using settings where postges is a the default db add limits to running the marked tests and the other that uses sqlite as the default db and runs all the unmarked tests.

Again, is there a simpler way to express this ?

SHxKM commented 4 weeks ago

Is the current support meant to include support for the MIRROR setting? Everything except that was working for me, and I also couldn't write the test data directly to the second database without permission errors, e.g. User.objects.using('reporting').create(email='whatever@wherever.com')

@lachlancannon any chance you've found a solution to this? I'm using two DATABASE entries just so I can have a separate connection (timeout) for slower queries, but seems like pytest is treating it as an entirely different database.

danielmcquillen commented 1 week ago

Wondering what the status is of this feature is.

I have two databases, a 'default' and a read-only DB, and would like to be able to have 'default' work as normal in a TestCase, but have the read-only database be connected to an actual, existing db in postgresql, not a dynamically-created db for the test.

(Background: I use the standard approach in Django of a using custom model class for models managed by the read-only DB, e.g.

class CustomExternalModel(models.Model):
    class Meta:
        managed = False
        abstract = True

...and then use a custom database router to know whether a model uses the 'default' or the read only DB.)

However I'm unclear how to configure pytest and pytest-django to set up my TestCase so that it creates a test db normally for 'default' but links directly to the existing read-only database (and doesn't add "test_" to the name or try to build the db dynamically).

I feel the answer might be spread out across multiple responses above but I'm not sure what it is or whether solutions above are still valid.

Thanks for any help! (I'll remove this comment and put somewhere else if there's a support channel I'm missing. In that case, apologies for misplaced question.)

hannylicious commented 1 week ago

Wondering what the status is of this feature is.

AFAIK, it's still working fine.

I have two databases, a 'default' and a read-only DB, and would like to be able to have 'default' work as normal in a TestCase, but have the read-only database be connected to an actual, existing db in postgresql, not a dynamically-created db for the test.

I've always been told that's a big 'no-no', as it defeats a lot of the ideas behind testing - that it should be very isolated, reproducible and rely as little on any outside things as possible (ideally it would be no outside resources, which is why Mocks exist).

Warning: lengthy reply. Skip to end for TL;DR.

It's not always that simple though, I get that. Where I work, we have a use case where we have a few db's that are not owned by us - and we have unmanaged models mapping the data out - not unlike your case. We end up having pytest spin up a 'fake test db' (similar to the db we don't own), then we have a script that creates the dummy data in that database for the tests we're running (leveraging factories / factory-boy, --no-migrations and the hypothesis library where wanted/needed). We even have scripts that setup an entire apps worth of data for end-to-end testing, too. That way, we're in control of the data in the db (even if it's randomized) and we know exactly what is going on, where and how. If you test against a live db with data already in it? You lose that control, the flexibility, the reproducibility, and there is always the potential for something to have changed in the live db beyond your control which will utterly destroy your test suite and cause problems. Not to mention, there may be instances where you want to setup a 'specific niche case' that doesn't exist currently in the live DB - and if you're testing against the live DB? Good luck. Do you really want to go through all that hassle? Realistically, no. And you wouldn't want to intentionally put potentially bad data into a live db just for the sake of testing.

There is also the undue burden on the systems. Hitting a live DB with requests might have a big impact on performance for other users currently using that db. It might cause other unforeseen issues. What if your test was written in such a way that it somehow manipulated the data from the db and saved it to the live db? No, thank you. The speed of sending those requests and the slow(er) replies when testing against a live db also are, in my opinion, reason enough to say 'no way'.

These are just some of the many reasons people recommend not testing against a live db. And there are lots more.

There are lots of solutions - you could mock the db response for the unmanaged models (giving you control, reproducibility, etc.), you could spin up randomized data using factory_boy like we do (again, bringing you all the benefits mentioned previously), you could use fixtures to fill the test_db with appropriate data similar to (if not exactly the same as) your real-life db. IMO fixtures are the most 'clunky' way to do it and get old fast - but it literally gives you an exact replica of your 'real life db' (completely with the existing data that exists in the db) for your local testing. There are so many ways to spin up the exact same data that is currently in your live db in the test environment using the test db's that it really should never be a consideration to test against the live db.

Maybe someone will come along with some examples of testing against live dbs and a perfect reason as to why they do it - but I personally have yet to ever have found a single one, honestly. I'm happy to entertain the idea though - it's just that I personally have never found one.

TL;DR: Don't test against live db's. Spin up your data in a local test_db and let pytest handle the creation/tear-down. Use scripts/libraries/fixtures to replicate your data in the local test db.

danielmcquillen commented 1 week ago

@hannylicious Thanks for your thoughtful response.

Sorry I should have clarified that the second db is read-only but doesn't need to be the live version. It's just a very static, kind of complex data store that is used as background data in a lot of operations spread widely across the main app. Also these are more of the integration tests than the unit tests...in the unit tests I think I've done a reasonable job of using patches and avoiding dependencies like this.

In the past I've tried to make a lot of MagicMock objects and patching them here...and there...and dang over there too...but I found myself spending way to much time trying to figure out where to inject them in different places of the app and the tests were getting brittle. I also spent a lot of time in tests trying to reconstruct just a part of this very stationary, very non-changing read-only db.

So I wanted to just create one real database that could be used in tests and then just set the tests up to connect to it. This db could act like fixtures that don't need to be loaded (and already relational!). And yes, a step away from that is your comment about writing scripts / fixtures to populate a portion or a complete representation of it.

But if all of your read-only models are set up as unmanaged and you're using a router to decide between databases for each model, it becomes harder to populate the read-only test db in your tests. Now you need some kind of custom test runner to change things...maybe look for and iterate through this subset of models and change them back to managed, and maybe do some other things to get the test database migrated and ready. Also, your read-only Django models might only model a portion of the target db...yet there might be foundational data not reflected in those (facade-like) Django model classes that's needed in the target db for it to be relationally sound. Setting up the tests to do this makes the tests more brittle and hard to understand when you come back to them, when you really only need the unchanging, read-only db attached and ready for the integration test to run.

But it may be I'll need to do the above as it's the proper way. If so, I haven't found a reference implementation for the current best practice on how to do this. Is there a gist or other resource you'd recommend that shows the latest way to handle this? It seems like decorators have changed and some of the earlier examples I've found aren't current. I can't seem to find one place that shows how to do this...and my limited understanding prevents me from putting them together in the right way.

Thanks again for any thoughts.

bluetech commented 1 week ago

I agree with @hannylicious said, but if you still want to do it, I think you can use the TEST setting in DATABASES, see https://docs.djangoproject.com/en/5.0/ref/settings/#test. You want to point it at your real db and tell it not to create or migrate etc.

hannylicious commented 1 week ago

Now you need some kind of custom test runner to change things...maybe look for and iterate through this subset of models and change them back to managed, and maybe do some other things to get the test database migrated and ready.

That's not necessarily true. You could do something like this:

class UnmanagedModel(Model):
    id      = models.IntegerField(db_column='ID', primary_key=True)
    year    = models.IntegerField(db_column='year')
    thing  = models.SubFactory(db_column='thing')
    ...whatever else...

    class Meta(object):
        managed = getattr(settings, 'SOME_ENV_VAR', False)
        db_table = '[database].[databaseTable]'

Given something like that - you can use 'SOME_ENV_VAR' to control whether or not it's a 'managed' model or not without the need for excess test-runners and things like that. You could add that into your core settings, your specific test settings, or even add it to the command at run time - however you want! Sure, it's a little wonky, but it works amazingly well and combined with the no-migrations argument to avoid it needing migrations files (in the event you don't have/run migrations against your unmanaged models), and using FactoryBoy to create huge swaths of 'test data' very simply - it becomes much more simple to spin up fully relational data in just a few lines of code. Depending how well you setup your factories? In some cases, you can write one line and have it setup all the related models in the read-only db as well as in the full access db.

Even given your use case, I'm not seeing any particular reason or benefit as to why you should use a real persistent db for the read-only portion of things. IMO, if I were in your position - I would be spinning up whatever test-db as read-only (or not, doesn't matter, you just need some data for the test really...) and just using factory-boy or some scripts to create the data in there I need. Even if the data in the database is 'a lot', create a factory for your unmanaged models that contain all expected attributes and creation pieces and let it manage the creation of the data. You can get as complex or simple as you want! You can use things like SubFactory to have it automatically create all the expected relationships.

Let's assume for the sake of discussion I needed some read-only test data about 2 kids in a school system. I need different kids, different schools, different school years, but same grades recorded in 2 classes - with both kids sharing the same last name of "BRICK" to recreate a bug that someone brought to our attention that only happens in that very specific occurrence. That might sound tricky and/or painful to setup - but using factory_boy?

ReportCardFactory.create_batch(2, science_grade="A", math_grade="B", student__last_name="BRICK")

Depending on how you setup your factories, it would/could create 2 report cards, tied to 2 different students with the last name of "BRICK", at 2 different schools, during different school years, etc. each with an "A" in science and a "B" in math. When you come back to it 6 months from now? Easily readable, easily understood. It would also create any myriad of other attributes/relationships you setup in a fashion to where the data is as randomized as you want or as specific as you want. All from that one line (if you setup the factories to do so). The mix of related models being created could be managed or unmanaged, factory_boy won't care - it's just putting the data into the db, in the related tables where they should be. And yes, it's wildly fast - using an in memory db it takes fractions of a second for this data to be setup. Assuming your test is short and sweet? The entire setup, test and teardown would be in most cases way, way faster.

I wrote this article about it: https://hannylicious.com/blog/testing-django/ ; Yes, the article is out of date. Yes, the article has issues that need correcting and could be more clear. However, the over-arching ideas of quickly/easily creating test data for unmanaged models is still relevant. It's also how I personally prefer to solve the issue. In my experience, letting your test suite handle the db side of things is much faster, more maintainable, more manageable, requires less resources, has less barriers and gives you access to all the benefits that test libraries offer.

I'm all for learning better ways to do things - so if there are times where a live read-only db would be appropriate, I'd love to hear about it! But from what you're describing? Again, in my opinion, I don't see any reason you can't just spin that data up at the start of your test using whatever test db you see fit or whatever db your test suite is designed to spin up and would still heavily recommend using the test-db your suite gives you and letting it do the maintenance of that.