raft-tech / TANF-app

Repo for development of a new TANF Data Reporting System
Other
17 stars 4 forks source link

[bug] OFA unable to export data to csv by record type and fiscal period #3137

Closed ADPennington closed 2 months ago

ADPennington commented 3 months ago

Thank you for taking the time to let us know about the issue you found. The basic rule for bug reporting is that something isn't working the way one would expect it to work. Please provide us with the information requested below and we will look at it as soon as we are able.

Description

OFA typically extracts data for all STTs by record type and fiscal period. For some record types (e.g. T2, T3), this typically includes upwards of 500K (reference). on 8/8/2024, we attempted to export the latest TANF T2 records for FY2023Q1, which includes approx 500K records, and the process failed.

Action Taken

What I expected to see

a pop-up of the csv file exported, such as in the example below: Screenshot (25)

What I did see

interface Screenshot 2024-08-09 103922

logs

10:36:18.398: [APP/PROC/WEB.0] [2024-08-09 14:36:18 +0000] [7] [WARNING] Worker with pid 1242 was terminated due to signal 9
10:36:18.405: [APP/PROC/WEB.0] [2024-08-09 14:36:18 +0000] [1419] [INFO] Booting worker with pid: 1419

Other Helpful Information

andrew-jameson commented 3 months ago

Potential solution(s) coming out of office hours:

  1. Upon hitting "go", queryset can use iterator/paginator over data
  2. Write flat csv file in /tmp/ then upload to a s3 location
  3. Potentially batch the writing of the CSV?
  4. Redirect after the "go" to auto-download file from the s3 link

https://nextlinklabs.com/resources/insights/django-big-data-iteration

elipe17 commented 3 months ago

Can we implement a class that uses the suggested solutions to be used in other areas of the code base? This large queryset issue presented itself in the testing of #3064 in the qasp environment: python manage.py clean_and_reparse -y 2023 -q Q1. This tries to bring a querset of ~860k records into memory which kills the process. See below:

vcap@9eda436c-8a8f-4409-442c-5963:~$ python manage.py clean_and_reparse -y 2023 -q Q1

You have selected to reparse datafiles for FY 2023 and Q1. The reparsed files will NOT be stored in new indices and the old indices 
These options will delete and reparse (87) datafiles.
Continue [y/n]? y
Previous reparse has exceeded the timeout. Allowing execution of the command.
2024-08-26 20:44:54,995 INFO clean_and_reparse.py::__backup:L47 :  Beginning reparse DB Backup.
Beginning reparse DB Backup.
2024-08-26 20:44:55,000 INFO db_backup.py::get_system_values:L51 :  Using postgres client at: /home/vcap/deps/0/apt/usr/lib/postgresql/15/bin/
Using postgres client at: /home/vcap/deps/0/apt/usr/lib/postgresql/15/bin/
2024-08-26 20:44:55,002 INFO db_backup.py::backup_database:L86 :  Executing backup command: /home/vcap/deps/0/apt/usr/lib/postgresql/15/bin/pg_dump -Fc --no-acl -f /tmp/reparsing_backup_FY_2023_Q1_rpv6.pg -d postgres://u9oc318z26941vlu:p2wtjxap7i30tjpg2gef0hfwv@cg-aws-broker-prodmezsouuuxrb933l.ci7nkegdizyy.us-gov-west-1.rds.amazonaws.com:5432/tdp_db_qasp
Executing backup command: /home/vcap/deps/0/apt/usr/lib/postgresql/15/bin/pg_dump -Fc --no-acl -f /tmp/reparsing_backup_FY_2023_Q1_rpv6.pg -d postgres://u9oc318z26941vlu:p2wtjxap7i30tjpg2gef0hfwv@cg-aws-broker-prodmezsouuuxrb933l.ci7nkegdizyy.us-gov-west-1.rds.amazonaws.com:5432/tdp_db_qasp
2024-08-26 20:45:39,331 INFO db_backup.py::backup_database:L91 :  Successfully executed backup. Wrote pg dumpfile to /tmp/reparsing_backup_FY_2023_Q1_rpv6.pg
Successfully executed backup. Wrote pg dumpfile to /tmp/reparsing_backup_FY_2023_Q1_rpv6.pg
2024-08-26 20:45:39,344 INFO db_backup.py::backup_database:L101 :  Pg dumpfile size in bytes: 280313953.
Pg dumpfile size in bytes: 280313953.
2024-08-26 20:45:39,344 INFO db_backup.py::upload_file:L173 :  Uploading /tmp/reparsing_backup_FY_2023_Q1_rpv6.pg to S3.
Uploading /tmp/reparsing_backup_FY_2023_Q1_rpv6.pg to S3.
2024-08-26 20:45:41,768 INFO db_backup.py::upload_file:L186 :  Successfully uploaded /tmp/reparsing_backup_FY_2023_Q1_rpv6.pg to s3://cg-178858c2-2794-44ac-a18a-c8f6efe4197a/backup/tmp/reparsing_backup_FY_2023_Q1_rpv6.pg.
Successfully uploaded /tmp/reparsing_backup_FY_2023_Q1_rpv6.pg to s3://cg-178858c2-2794-44ac-a18a-c8f6efe4197a/backup/tmp/reparsing_backup_FY_2023_Q1_rpv6.pg.
2024-08-26 20:45:41,771 INFO db_backup.py::main:L326 :  Deleting /tmp/reparsing_backup_FY_2023_Q1_rpv6.pg from local storage.
Deleting /tmp/reparsing_backup_FY_2023_Q1_rpv6.pg from local storage.
2024-08-26 20:45:41,796 INFO backup_db.py::handle:L36 :  Cloud backup/restore job complete.
Cloud backup/restore job complete.
2024-08-26 20:45:41,796 INFO clean_and_reparse.py::__backup:L49 :  Backup complete! Commencing clean and reparse.
Backup complete! Commencing clean and reparse.
2024-08-26 20:45:42,429 INFO clean_and_reparse.py::__delete_associated_models:L156 :  Before summary delete
Before summary delete
2024-08-26 20:45:42,437 INFO clean_and_reparse.py::__delete_associated_models:L158 :  Before delete errors
Before delete errors
2024-08-26 20:45:51,201 INFO clean_and_reparse.py::__delete_associated_models:L160 :  Before delete records
Before delete records
2024-08-26 20:45:51,201 INFO clean_and_reparse.py::__delete_records:L105 :  Deleting model <class 'tdpservice.search_indexes.models.tanf.TANF_T1'>
Deleting model <class 'tdpservice.search_indexes.models.tanf.TANF_T1'>
2024-08-26 20:45:51,440 INFO clean_and_reparse.py::__delete_records:L108 :  total deleted: 863642
total deleted: 863642
2024-08-26 20:45:51,441 INFO clean_and_reparse.py::__delete_records:L111 :  Deleteing from elastic
Deleteing from elastic
Killed