Closed cuducos closed 6 years ago
Hi @cuducos, what I plan to do to test this PR:
Locally just run the command to import reimbursements and checks the database or the UI to make sure it worked.
And I'll let for you to check if it is working:
(personally I haven't tested it, but once it's merged I can run the command in production and check it)
What I did to test this PR:
Clone the repository:
$ git clone git@github.com:datasciencebr/jarbas.git
Open the repo folder:
$ cd jarbas
Checkout to @cuducos branch:
$ git checkout -b cuducos-lazy-rows origin/cuducos-lazy-rows
Update the branch:
$ git merge master
Copy the .env
file:
$ cp contrib/.env.sample env
Build and start services:
$ docker-compose up -d
Create the database and apply the migrations:
$ docker-compose run --rm django python manage.py migrate
$ docker-compose run --rm django python manage.py searchvector
Seeding it with sample data:
$ docker-compose run --rm django python manage.py reimbursements /mnt/data/reimbursements_sample.xz
$ docker-compose run --rm django python manage.py companies /mnt/data/companies_sample.xz
$ docker-compose run --rm django python manage.py suspicions /mnt/data/suspicions_sample.xz
$ docker-compose run --rm django python manage.py tweets
localhost:8000/dashboard/
:
And it seems to be working, for me it look ok :)
What is the purpose of this Pull Request? When trying to import data on the production server it looks like
rows
was unable to load and convert all data types for the whole 1.6 million lines of the source CSV. At this point the process just freezes after uncompressing the.xz
file. This PR tries to fix this issue.What was done to achieve this purpose? The huge CSV is loaded line by line with native
csv.DictReader
and all fields are kept as raw/string format when passed to Celery. Then the conversion of data types happens line by line in the async/background process (not for the whole CSV asrows
default behavior).How to test if it really works? Locally just run the command to import reimbursements and checks the database or the UI to make sure it worked. To have an idea if this is gonna work in the production server one might try to run it in a virtual machine with 4Gb of RAM (personally I haven't tested it, but once it's merged I can run the command in production and check it).
Who can help reviewing it? @anaschwendler @irio