okfn-brasil / jarbas

🎩 API for information and suspicions about reimbursements by Brazilian congresspeople
https://jarbas.serenata.ai/
296 stars 61 forks source link

Remove django simple history #316

Closed cuducos closed 6 years ago

cuducos commented 6 years ago

What is the purpose of this Pull Request?

Given that:

  1. After re-studying data from CEAP we decided it was unnecessary to keep historical changes in the reimbursement data; forthcoming changes in serenata-toolbox will target reimbursements we were not targeting so far and the need for historical tracking will be deprecated;
  2. Loading reimbursement in production was taking up to a week, and that was far too much.

This PR intends to:

  1. Speed-up considerably (from few days to a few minutes) the reimbursement import;
  2. Remove the need for historcal tracking.

What was done to achieve this purpose?

  1. Removing tools that helped us to keep historical changes (django-simple-history and the field available_in_latest_dataset from Reimbursement model);
  2. And by using bulk_create on import (which we had dismissed in order to get django-simple-history working).

How to test if it really works?

Important: to allow this branch to be deployed this PR is based on #314 because that is the branch currently used for production. If you prefer to code review things in a baby step strategy it is strongly recommended to start by #314. Otherwise, review and eventually merge this and #314 will be automatically merged.

In order to re-build container locally, temporally edit docker-compose.yml:

diff --git a/docker-compose.yml b/docker-compose.yml
index 601545a..7e17037 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -2,7 +2,10 @@ version: '3'
 services:

   django:
-    image: datasciencebr/jarbas-backend
+    # image: datasciencebr/jarbas-backend
+    build:
+      context: .
+      dockerfile: Dockerfile
     environment:
       - ALLOWED_HOSTS=localhost,127.0.0.1
       - AMAZON_S3_BUCKET=serenata-de-amor-data
@@ -16,7 +19,10 @@ services:
       - "./contrib/data:/mnt/data"

   tasks:
-    image: datasciencebr/jarbas-backend
+    # image: datasciencebr/jarbas-backend
+    build:
+      context: .
+      dockerfile: Dockerfile
     environment:
       - CELERY_BROKER_URL=amqp://guest:guest@queue/
     depends_on:
@@ -25,7 +31,10 @@ services:
     command: ["celery worker --app jarbas"]

   elm:
-    image: datasciencebr/jarbas-frontend
+    # image: datasciencebr/jarbas-frontend
+    build:
+      context: .
+      dockerfile: Dockerfile-elm
  1. Rebuild containers;
  2. Run migrations python manage.py migrate
  3. Optionally clean-up your reimbursements with python manage.py shell_plus and then Reimbursement.objects.all().delete();
  4. Import reimbursements;
  5. Check you have reimbursements in the database (maybe just checking http://localhost:8000).

Don't forget to undo changes in docker-compose.yml ; )

Who can help reviewing it?

@anaschwendler @irio @jtemporal

Irio commented 6 years ago

Loaded all the reimbursements, in my local env, in 20 minutes.