Add Django unit testing back in to repo

nam20485 commented 6 years ago

During the course of the "great travis merge of 2018" I removed some of the fixture and unit testing that Brian had built in to this repo, due to some errors it was causing in the travis image build. It needs to be added back in and the problem diagnosed. (Sorry Brian :disappointed:)

BrianHGrant commented 6 years ago

TBH, I am not sure that the testing that was currently in play was actually a great pattern. Travis does have a postgres environment available. If we can use this w/ fixtures, specify test db variables, we may be on a better track then the testing on prod option, and not be fronting the AWS costs for a test db environment. Not sure?

znmeb commented 6 years ago

What version is the Travis PostgreSQL? If it's on Ubuntu 16.04, it's probably the default, which I think is 9.5. If we want 9.6 we'll probably have to add the PGDG repo and apt-get it during the test.

BrianHGrant commented 6 years ago

https://docs.travis-ci.com/user/database-setup/#Using-a-different-PostgreSQL-Version

znmeb commented 6 years ago

Nice! Has PostGIS too.

MikeTheCanuck commented 6 years ago

As I ranted in #81, this very much depends on what we're trying to test for - and let's be clear this year around so we're not assuming more than we're getting:

Do we want to verify the database driver?
Do we want to confirm connectivity to our database server?
Do we want to validate the Django code in each API's application (that it's still retrieving the data it expects)?

Each of these could require an increasingly more specific and complicated configuration in both testing and in Travis.

Testing connectivity to a Travis-hosted dummy database will only confirm (1) but not (2) or (3). We would do ourselves and all our colleagues on the project teams if we could help folks accomplish (3) (and implicitly (2) as well) - that way they can continue to concentrate on evolving their API applications, and they could benefit from the real value of continuous integration (CI) which catches errant mistakes that create regressions in production.

BrianHGrant commented 6 years ago

So, with the Travis option, we would be loading data into the database as fixtures, not having an empty database. The tests would be run on this data. I am confused as to how this would not answer question 3 of testing the django code?

On 1) we can assume the drivers work in and of themselves because Django and the db drivers themselves are tested.

On 2) This is not really the point of django unit testing.

MikeTheCanuck commented 6 years ago

I was obviously more confused Brian, sorry - I had forgotten that when running the kinds of tests that require the ability to create a database, that was so that the test process could create schema and data as well so that the round-trip could be verified.

On 1) I'm less interested in "does the driver code itself work" and more interested in "has the configuration that allows the app to use the driver been borked somehow"? (for example, did we change the driver being used, or fat-finger the file path)

On 2) I'm similarly interested in "have the connection parameters, and the way we ingest them into the application, been borked somehow"? (for example, did we rename an env var)

nam20485 commented 6 years ago

So where does verifying that the django app starts up correctly when the image is run, as well as, do the API endpoints accept connection fit into these options? Is it covered or subsumed by any of the other testing strategies suggested?

MikeTheCanuck commented 6 years ago

OK, I've been thinking further on the question of "testing scenarios", and I have a few strong opinions:

I am uncomfortable with granting DB create privilege to any DB account, whether by development, by Travis or by the Django app in production. That's just a disaster of billing waiting to happen, when somehow or other those creds fall into the wrong hands.
I still feel it would be great to have as close to end-to-end integration testing so that we have the best assurance that the next container deploy is going to result in a working API.
We need to be clear what we are and are not testing for, and feel comfortable that what we're testing is worth the effort, before we start implementing more changes.

This is how I'm thinking of the various testing strategies available to us:

Does the App return 200/OK to trivial HTTP requests? This verifies the combination of gunicorn + Django is cooperating - that gunicorn's configuration is still resulting in a successful web server fronting the API.
Does the App currently have DB connectivity? This verifies that the database driver configuration still works, and that the DB configuration in Django (e.g. env vars) and secrets still work.
Does the App return JSON to the request? This verifies that the database models still match the schema in the database, that the queries sent are still trivially well-formed, and hopefully that the database still has records in the corresponding tables.
Does the App return the expected JSON to the request? This is the toughest layer - verifying things like data types, specific record and column values - what makes the application accurately represent what it was meant to represent.

I would like to see at least (1), and as many of the additional types of testing available as possible. If that means attaching to the production DB for each application, then we should consider how to handle read/write privileges:

does the application itself require us to support writes from the Internet? Then we need read/write creds in prod - which would mean read/write creds in test/Travis, which means a separate database than production for Travis to verify the writes (assuming anyone is comfortable writing tests that perform read write-and-read-back tests - or maybe we're just using some automated Django facility to generate random data compliant with the app's models, and then reading back that random data?). That test database will be persistent, rather than created and destroyed.
if the production application doesn't require writes, then the production creds should likely be moved down to read-only access. At which point we should be able to talk ourselves down to just performing direct reads from the production database during test - since (a) the production database should comply with the fundamentals of the app's models, and (b) there's no reason to think that the DB will be any less persistent in production than it would be in a test instance.

BrianHGrant commented 6 years ago

Ok, we are getting fairly far into our season, and moving towards the place where we want to have some useful tests that verify service as described by @MikeTheCanuck. In the long run, as an organization we do need to have a real testing solution, that confirms to and helps to teach volunteers in our program how testing occurs in the "real world".

This would include a separation of concerns of whether the database is live vs. unit test custom django code vs. other concerns.

This also would not rely on human fallibility in making sure that only read-only creds were assigned to the db, and that folks used the "keepdb" or other flags to not accidentally delete or corrupt the prod database in testing but instead enforce these reqs. through code.

This all being said, yeah we could accomplish our goals for our current project season through testing on the live DB and using the read-only creds, and I am OK with this.

Currently on the examplar_updates branch of the transportation_systems_backend repo, I have initiated this test on prod pattern using pytest as my test runner. Chose this for the additional configuration abilities it provides, and that nose is deprecated.

To config:

add pytest-django to your common requirements
create a pytest.ini file at root like so

[pytest]
DJANGO_SETTINGS_MODULE = crash_data_api.settings
python_files = tests.py test_*.py *_tests.py
addopts = --reuse-db

add a conftest.py file at root which overrides the test database to point to prod:

import pytest
import os
import crash_data_api

@pytest.fixture(scope='session')
def django_db_setup():
    crash_data_api.settings.DATABASES['default'] = {
        'ENGINE': 'django_db_geventpool.backends.postgresql_psycopg2',
        'PASSWORD': os.environ.get('POSTGRES_PASSWORD'),
        'NAME': os.environ.get('POSTGRES_NAME'),
        'USER': os.environ.get('POSTGRES_USER'),
        'HOST': os.environ.get('POSTGRES_HOST'),
        'PORT': os.environ.get('POSTGRES_PORT'),
        'CONN_MAX_AGE': 0,
        'OPTIONS': {
            'MAX_CONNS': 20
        }
    }

change the test-entrypoint to use the pytest command, not internal django testing:

pytest

Checkout my note on testing for more info:

https://github.com/hackoregon/transportation-system-backend/tree/examplar-updates#note-on-testing

nam20485 commented 6 years ago

We need some type of solution for unit testing at this point. I went ahead and implemented this pattern in my hackoregon/disaster-resilience-backend repo. I copied what you have exactly and it seems to be functioning well.

It is running successfully the 4 example unit tests I already had in app, using the new disaster-resilience DB instance Mike created for me. This DB only allows read-only access, so I think this methods satisfies the requirement that we don't allow any write access to the DB. Which was a big problem with the old method.

@BrianHGrant I don't know anything about how fixtures work- what is different between this method and the previous one? How does this method get away with not writing anything to the DB where the old method couldn't?

(Adding @bhgrant8 in case you didn't see this.)

nam20485 commented 6 years ago

@MikeTheCanuck Bringing to Mike's attention. I think we should consider this as a solution for our unit testing needs.

hackoregon / backend-examplar-2018

Add Django unit testing back in to repo #82