Move db setup to item setup, require item mark for db access

flub commented 12 years ago

Currently the Django database setup happens at pytest session setup, but errors there (e.g. incorrect model) result in py.test internal errors. This issue discusses moving the db setup to item setup with session-wide caching. It also proposes to remove the --no-db option and instead require each tests which uses the database to be marked as such.

Original report:

If you break the model you get an ugly INTERNALERROR. The quick and dirty fix for this is to wrap the setup_databases() call in pytest_sessionstart():

import pytest
...
try:
    old_db_config = runner.setup_databases()
except SystemExit:   # django is harsh
    raise pytest.UsageError('Django DB setup failed')

I'm not sure if there's a better way to resolve this, it does feel a bit odd to use UsageError but it gives a nice result:

~$ py.test app_reef/views_test.py
============================= test session starts ==============================
platform linux2 -- Python 2.7.3 -- pytest-2.2.1
Creating test database for alias 'default'...
Error: One or more models did not validate:
app_reef.eventattr: 'event' has a relation with model Events, which has either not been installed or is abstract.

ERROR: Django DB setup failed
~$

pelme commented 12 years ago

SystemExit does indeed seem a bit harsh. :)

Maybe @hpk42 has an idea on which exception to raise for errors like this?

flub commented 12 years ago

Was just talking on #pylib about this and ronny suggested maybe not doing the database setup at session start.

If you think of it as a funcarg you can still cache it at the session level. Now this would require you to explicitly give a funcarg for each test needing the database so that might not be ideal. But the general idea is the same: in the test setup it is possible to create the database and cache it on the session manually (IIRC the session is reachable from the item).

This would be nicer if it the funcargrequest and items where unified (as is planned) as it could then be a real funcarg then. In the meantime the manual caching on the session is viable though.

I'll see if I can implement something neat to do this (which would also integrate with transaction test cases). In the mean time the hack above seems the best solution.

pelme commented 12 years ago

Instead of a funcarg, a mark could be used.

I think the ideal would be to mark all database tests with pytest.mark.django_db or something like that, to be explicit where database access is required. And unless @pytest.mark.django_db is given, no database access should be allowed (i.e. --no-db behavior). Standard Django TestCase could treated as pytest.mark.django_db automatically (but not SimpleTestCase).

This could also be useful in a multi db setup, we could for instance do pytest.mark.django_db(multi_db=True) the same way Django's standard test case allows specifying multi_db. Instead of transaction_test_case, pytest.mark.django_db(transaction_test_case=True) could be used. Maybe it would simplify things?

This would be good when writing pure unit tests where database access should not happen anyways, and force tests to be explicit when database access is required.

The database setup could then be done in the item setup and cached for the entire session. It would then automatically create the database only when needed, which would be awesome (without having to pass in --no-db -- it would just work out).

This would however be a huge backward incompatible change since all old test cases written with pytest-django would suddenly not be able to access the database. I guess this fork is backward incompatible from the original fork anyways, so it might not be that bad... Thoughts?

flub commented 12 years ago

Yes, I do agree a mark which works as you describe would be great (though your api a bit wordy ;-), no need to stick that close the django names I think). This would solve my lingering issue of always creating the test db even if I'm not testing a part of the code which needs it. But once a test has used the djangodb mark how can you stop other tests from using this database? I guess this isn't too much of an issue. I am also slightly concerned that this will be too much boilerplate compared to now where you can just start writing your test. While it would suite my usecase well it might put off people who want to test just a django app/site and try this out compared to "manage.py test".

As for the funcarg, I don't think it should be dismissed but it should rather be another way of enabling the django db setup. But it can be treated as a separate issue and wait until the items and funcargs have been unified. I haven't got much experience with django yet but I do think some sort of "models" funcarg would seem natural, so you can do e.g. models.MyModel(attr=val); models.User.objects.filter(...) etc in your test code instead of having to import the models by hand (which requires DSM set). And requiring the mark when already asking for the funcarg would seem silly.

As for the backwards compatibility you could enable this with a setting in a conftest.py or pytest.ini if you don't want to break old code.

pelme commented 12 years ago

I agree that it is a bit wordy compared to nothing, but I think that is a good thing in this case. Tests that requires database interaction should ideally be very few - you should be forced to be explicit and think very hard before introducing tests that depends on the database - and then that line won't be too much typing. You can also mark an entire test module/class by doing pytestmark = pytest.mark.django_db at the module level. I think this kind of best practices should be encouraged wherever possible.

It is easy to force tests to not access the database, regardless of whether it is available or not: Carl Meyer describes a technique here: http://www.youtube.com/watch?v=ickNQcNXiS4&feature=player_detailpage#t=1426s. This could then be applied to any test not specifying pytest.mark.django_db. (I very much recommend the entire talk, it is packed with useful stuff. :-))

I use pytest.mark.no_db in my current project and have this in my conftest.py:

def pytest_runtest_setup(item):
    if 'no_db' in item.keywords:
        django.db.backends.util._CursorWrapper = django.db.backends.util.CursorWrapper
        django.db.backends.util.CursorWrapper = Mock(side_effect=RuntimeError('No database access is allowed in this test!'))

def pytest_runtest_teardown(item, nextitem):
    if 'no_db' in item.keywords:
        django.db.backends.util.CursorWrapper = django.db.backends.util._CursorWrapper

This is just an example on how it can be done - I think no_db is the wrong approach, it really should be the other way around.

Regarding people who just want to try out py.test as a test runner and potentially switch from manage.py test: Everything will just work out of the box, since Djangos django.test.TestCase's will be treated in a special way, and will always imply that the database is required. The database would be created for any Django TestCase/TransactionTestCase automatically.

If the default behavior was to be changed, the tests which actually needed the pytest.mark.django_db mark would be very obvious -- they would all fail, and you would immediately get a list of tests that needed to be updated. Sticking pytestmark = pytest.mark.django_db into those tests should be a relatively quick fix. Unless you already have thousands of database dependent tests, which is not Django TestCases, this shouldn't be a big issue. And if you do -- you probably already have way bigger problems. :)

I can also think of an easy workaround - it is probably possible to just add the django_db marker to all test item in a projects conftest.py file, if the old behavior is wanted. If this is possible (which I am sure), it should of course be documented.

Since this is a fork, and it is relatively new, I am not that afraid of breaking backward compatibility right now. Please raise your concerns here if you think this is the wrong way to go.

Please open a new issue and expand on your ideas regarding the funcargs. Funcargs are truly awesome and I haven't wrapped my head around all the possibilites they offer yet. :)

flub commented 12 years ago

Btw, you've convinced me that it's not a bad thing or burdensome to require marking tests as needing the database. So I think what's outlined in this issue will be a nice improvement for the plugin.

But now for some off-topic discussion: right now I'm putting a fair deal of the important logic of an app into methods on the model and it's manager. It was my understanding that this is fairly logical to do for django. But this naturally requires a great deal of tests which require the database to ensure the model logic is correct, while you suggest these tests should be in the minority. Am I taking the wrong approach, or do you just normally expect these tests to be dwarfed by the amount of tests for the rest of the site?

pelme commented 12 years ago

It is a hard problem, and I am afraid I don't have a perfect answer. :) I am learning every day how to write better tests. Some notes:

Make sure all business logic is decoupled from database actions - they can still be methods on your models, just make sure your logic is decoupled from the database queries (See http://www.youtube.com/watch?v=ickNQcNXiS4 at ~14:00)

When writing unit tests, I have found mocking very useful to cut away parts of the system that I do not currently tests - including parts that interact with the database. I use Michael Foords mock library: http://www.voidspace.org.uk/python/mock/

An example: a view might call a model manager method which generates a very complex SQL query and then returns some model objects. To test the view, I would mock the manager method to return a known set of non-saved objects, suitable for the view test. The complex query (model method) must of course be tested against the real database by itself.

To test the view, RequestFactory can be used to invoke the view function directly, rather than using the test client. Everything can then be faked on the request object without database access (i.e. request.user = User(...)) to test when someone is logged in.

When mocking the model method with patch.object / and using autospec=True, mock will make sure that the manager method exists, and that it is called with proper arguments. This catches embarrassing integration errors between the view and the model, while still mocking the method. This should ideally require a complete integration test too to be bullet proof - but this is usually pretty good. It can be argued my code is badly designed when I need to mock, but I find it very useful in situations like that.

What I am talking about here is mostly unit testing, and you will probably need some integration tests too, but IMO this kind of unit testing gets you far. I have ~400 of these kinds of tests in my current project which runs in about 5 seconds). The feedback with unit tests that run at that kind of speed is truly amazing when developing.

Django's documentation and default test capabilities does not really encourage unit testing in this fashion, which I think is a bit sad. RequestFactory is now part of Django, which is great, but does not get very much attention in the docs. A lot of test cases for Django projects have tests that test too much or are not very good designed.

Carl Meyer's (Django core developer) PyCon talk is awesome and is packed with examples on how to test things in a good way.

I might write some docs for pytest-django with how the different funcargs/database markers/mocks can be used together to show some common patterns/examples in a very straightforward way.

flub commented 12 years ago

I'm working on this (will appear in my fork shortly, almost works completely), but while doing this I was wondering if you wanted to keep compatibility with the from pytest_django import transaction_test_case; @transcation_test_case API or not.

It can be made to work, but at the cost of a little bit of extra cruft/effort.

flub commented 12 years ago

Another question is whether the use of funcargs like admin_client should automatically set the djangodb marker. I've done so for the live_server funcarg since that is really obscure otherwise. I think it should be done for admin_client as well.

I've now pushed these changes to my fork, on the lazy_django branch: https://github.com/flub/pytest_django I'll try to add tests, docs and multidb support before creating a pull request. Feel free to comment or even commit improvements to this repo.

BTW: thanks for the long explanation above, it all sounds rather sensible. And Carl Meyer's talk video was already on my to-watch list but now made it to the top, was good to see.

pytest-dev / pytest-django

Move db setup to item setup, require item mark for db access #9