Closed cuducos closed 7 years ago
Testing with sample data seems Ok. Ready to Merge.
Testing with sample data seems Ok. Ready to Merge.
In PVT @decko told me that actually it wasn't working with the Celery (worker) running in a container. So we still need to fix that before considering merging it. Is that right?
Ok… I'm back to this PR.
I merged the novelties from master
branch and studied this PR a bit more to advance it further.
The master
branch imported the sample in 34s (counting the time to start Docker containers) and this branch did it in 24s (idem). We were saving 30% of the time, which is great.
Then I moved the serialize part to the async task (42e512a). The importing command took 15s — we're saving more than 50% of time now, yay!
From that I think the Celery part is indeed working and this PR is good for code review and, if that's the case, merging… cc @anaschwendler
Many thanks @decko for the tests and the ideas ; )
UPDATE
How to test this PR:
python manage.py reimbursements /mnt/data/reimbursements_sample.xz
in the master
branch, and time it (basically you can run time python manage.py reimbursements /mnt/data/reimbursements_sample.xz
)cuducos-celery
branch, delete all reimbursements (python manage.py shell_plus
and then Reimbursements.objects.all().delete()
might do the job), and time the import again.If at the end of the process the a) the importing was quicker in this branch when compared to master
, and b) reimbursements were actually saved in the database (python manage.py shell_plus
and then Reimbursements.objects.count()
might do the job), then it's working as predicted ; )
Hello @cuducos and @decko!
I'll start testing this PR little by little and how (so it will be edited along the way):
Clone the repository:
$ git clone git@github.com:datasciencebr/jarbas.git
Open the repo folder:
$ cd jarbas
I'll try to run with Docker, even if its showed without, I'll try :)
Copy the .env
file:
$ cp contrib/.env.sample env
Build and start services:
$ docker-compose up -d
Create the database and apply the migrations:
$ docker-compose run --rm django python manage.py migrate
I'll start testing as @cuducos suggest in the last comment:
python manage.py reimbursements /mnt/data/reimbursements_sample.xz
in the master
branch, and time it:
$ time docker-compose run --rm django python manage.py reimbursements /mnt/data/reimbursements_sample.xz
And the result:
Starting jarbas_postgres_1 ...
Starting jarbas_postgres_1 ... done
Starting with 0 reimbursements
Processed: 1000 lines Updated: 0 reimbursements Created: 1000 reimbursements Skip: 0 reimbursements
docker-compose run --rm django python manage.py reimbursements 1.13s user 0.34s system 11% cpu 13.221 total
Ok, now let's test with @cuducos PR:
Checkout to @cuducos branch:
$ git checkout -b cuducos-celery origin/cuducos-celery
Update the branch:
$ git merge master
Delete all reimbursements:
$ docker-compose run --rm django python manage.py shell_plus
> Reimbursement.objects.all().delete()
The result:
>>> Reimbursement.objects.all().delete()
(1000, {'core.Tweet': 0, 'core.Reimbursement': 1000})
$ time docker-compose run --rm django python manage.py reimbursements /mnt/data/reimbursements_sample.xz
The result:
Starting jarbas_queue_1 ...
Starting jarbas_queue_1 ... done
Starting jarbas_postgres_1 ... done
Starting jarbas_tasks_1 ... done
1000 reimbursements scheduled to creation/update
docker-compose run --rm django python manage.py reimbursements 1.11s user 0.26s system 13% cpu 9.829 total
It gave me a diference of 2secs by testing 3 times and getting the average time 🎉
Is it ok @cuducos ? Can I merge?
Nice! Good work, people, congrats!
This is a work in progress, I just created it earlier to have feedbacks. The aim, when complete, is to address part of #217.
What is the purpose of this Pull Request? The purpose of this PR is make the process of importing reimbursement quicker and without manual steps as described in #217.
What was done to achieve this purpose? I introduced Celery to perform some task asynchronously: that way the heavy work of the import is handled in the background.
How to test if it really works?
brew install rabbitmq
orapt-get install rabbitmq-server
should do the job for macOS and any Debian based Linux such as Ubuntu (Docker environment is ready, no need to worry about it)$ python -m pip install -U -r requirements.txt
$ python manage.py test
$ celery worker --app jarbas
$ python manage.py reimbursements <path to reimbursements.xz>
Who can help reviewing it? @jtemporal @lipemorais @davinirjr @augustogoulart @adrianomargarin
What is out of scope In #217 I mentioned creating a API endpoint so Rosie can send
POST
s with reimbursement data. I think this is an issue for a new PR when this one is finished, reviewed, stable an merged.Next steps in this PR
.xz
) line (either I screwed up when wiring up Celery, or the serialize step is another bottleneck; if so, a callback in Celery should be able to handle serializing and then the create/update step)Next steps
docker-compose.yml
after this PR is merged