Closed ADPennington closed 2 months ago
Potential solution(s) coming out of office hours:
https://nextlinklabs.com/resources/insights/django-big-data-iteration
Can we implement a class that uses the suggested solutions to be used in other areas of the code base? This large queryset issue presented itself in the testing of #3064 in the qasp environment: python manage.py clean_and_reparse -y 2023 -q Q1
. This tries to bring a querset of ~860k records into memory which kills the process. See below:
vcap@9eda436c-8a8f-4409-442c-5963:~$ python manage.py clean_and_reparse -y 2023 -q Q1
You have selected to reparse datafiles for FY 2023 and Q1. The reparsed files will NOT be stored in new indices and the old indices
These options will delete and reparse (87) datafiles.
Continue [y/n]? y
Previous reparse has exceeded the timeout. Allowing execution of the command.
2024-08-26 20:44:54,995 INFO clean_and_reparse.py::__backup:L47 : Beginning reparse DB Backup.
Beginning reparse DB Backup.
2024-08-26 20:44:55,000 INFO db_backup.py::get_system_values:L51 : Using postgres client at: /home/vcap/deps/0/apt/usr/lib/postgresql/15/bin/
Using postgres client at: /home/vcap/deps/0/apt/usr/lib/postgresql/15/bin/
2024-08-26 20:44:55,002 INFO db_backup.py::backup_database:L86 : Executing backup command: /home/vcap/deps/0/apt/usr/lib/postgresql/15/bin/pg_dump -Fc --no-acl -f /tmp/reparsing_backup_FY_2023_Q1_rpv6.pg -d postgres://u9oc318z26941vlu:p2wtjxap7i30tjpg2gef0hfwv@cg-aws-broker-prodmezsouuuxrb933l.ci7nkegdizyy.us-gov-west-1.rds.amazonaws.com:5432/tdp_db_qasp
Executing backup command: /home/vcap/deps/0/apt/usr/lib/postgresql/15/bin/pg_dump -Fc --no-acl -f /tmp/reparsing_backup_FY_2023_Q1_rpv6.pg -d postgres://u9oc318z26941vlu:p2wtjxap7i30tjpg2gef0hfwv@cg-aws-broker-prodmezsouuuxrb933l.ci7nkegdizyy.us-gov-west-1.rds.amazonaws.com:5432/tdp_db_qasp
2024-08-26 20:45:39,331 INFO db_backup.py::backup_database:L91 : Successfully executed backup. Wrote pg dumpfile to /tmp/reparsing_backup_FY_2023_Q1_rpv6.pg
Successfully executed backup. Wrote pg dumpfile to /tmp/reparsing_backup_FY_2023_Q1_rpv6.pg
2024-08-26 20:45:39,344 INFO db_backup.py::backup_database:L101 : Pg dumpfile size in bytes: 280313953.
Pg dumpfile size in bytes: 280313953.
2024-08-26 20:45:39,344 INFO db_backup.py::upload_file:L173 : Uploading /tmp/reparsing_backup_FY_2023_Q1_rpv6.pg to S3.
Uploading /tmp/reparsing_backup_FY_2023_Q1_rpv6.pg to S3.
2024-08-26 20:45:41,768 INFO db_backup.py::upload_file:L186 : Successfully uploaded /tmp/reparsing_backup_FY_2023_Q1_rpv6.pg to s3://cg-178858c2-2794-44ac-a18a-c8f6efe4197a/backup/tmp/reparsing_backup_FY_2023_Q1_rpv6.pg.
Successfully uploaded /tmp/reparsing_backup_FY_2023_Q1_rpv6.pg to s3://cg-178858c2-2794-44ac-a18a-c8f6efe4197a/backup/tmp/reparsing_backup_FY_2023_Q1_rpv6.pg.
2024-08-26 20:45:41,771 INFO db_backup.py::main:L326 : Deleting /tmp/reparsing_backup_FY_2023_Q1_rpv6.pg from local storage.
Deleting /tmp/reparsing_backup_FY_2023_Q1_rpv6.pg from local storage.
2024-08-26 20:45:41,796 INFO backup_db.py::handle:L36 : Cloud backup/restore job complete.
Cloud backup/restore job complete.
2024-08-26 20:45:41,796 INFO clean_and_reparse.py::__backup:L49 : Backup complete! Commencing clean and reparse.
Backup complete! Commencing clean and reparse.
2024-08-26 20:45:42,429 INFO clean_and_reparse.py::__delete_associated_models:L156 : Before summary delete
Before summary delete
2024-08-26 20:45:42,437 INFO clean_and_reparse.py::__delete_associated_models:L158 : Before delete errors
Before delete errors
2024-08-26 20:45:51,201 INFO clean_and_reparse.py::__delete_associated_models:L160 : Before delete records
Before delete records
2024-08-26 20:45:51,201 INFO clean_and_reparse.py::__delete_records:L105 : Deleting model <class 'tdpservice.search_indexes.models.tanf.TANF_T1'>
Deleting model <class 'tdpservice.search_indexes.models.tanf.TANF_T1'>
2024-08-26 20:45:51,440 INFO clean_and_reparse.py::__delete_records:L108 : total deleted: 863642
total deleted: 863642
2024-08-26 20:45:51,441 INFO clean_and_reparse.py::__delete_records:L111 : Deleteing from elastic
Deleteing from elastic
Killed
Thank you for taking the time to let us know about the issue you found. The basic rule for bug reporting is that something isn't working the way one would expect it to work. Please provide us with the information requested below and we will look at it as soon as we are able.
Description
OFA typically extracts data for all STTs by record type and fiscal period. For some record types (e.g. T2, T3), this typically includes upwards of 500K (reference). on 8/8/2024, we attempted to export the latest TANF T2 records for FY2023Q1, which includes approx 500K records, and the process failed.
Action Taken
apply filters
go
What I expected to see
a pop-up of the csv file exported, such as in the example below:
What I did see
interface
logs
Other Helpful Information