raft-tech / TANF-app

Repo for development of a new TANF Data Reporting System
Other
17 stars 4 forks source link

3064 -Reparse Meta Model #3126

Closed elipe17 closed 3 months ago

elipe17 commented 4 months ago

Summary of Changes

How to Test

cd tdrs-frontend && docker-compose up
cd tdrs-backend && docker-compose up --build
  1. Open http://localhost:3000/ and sign in.
  2. Submit some files
  3. execute reparse command and try to break it

Deliverables

More details on how deliverables herein are assessed included here.

Deliverable 1: Accepted Features

Checklist of ACs:

Deliverable 2: Tested Code

Deliverable 3: Properly Styled Code

Deliverable 4: Accessible

Deliverable 5: Deployed

Deliverable 6: Documented

Deliverable 7: Secure

Deliverable 8: User Research

Research product(s) clearly articulate(s):

codecov[bot] commented 3 months ago

Codecov Report

Attention: Patch coverage is 60.50955% with 62 lines in your changes missing coverage. Please review.

Project coverage is 91.07%. Comparing base (1166030) to head (2226a3e). Report is 1 commits behind head on develop.

Files Patch % Lines
...d/tdpservice/search_indexes/models/reparse_meta.py 49.35% 37 Missing and 2 partials :warning:
...drs-backend/tdpservice/data_files/admin/filters.py 51.85% 13 Missing :warning:
tdrs-backend/tdpservice/search_indexes/util.py 37.50% 5 Missing :warning:
tdrs-backend/tdpservice/parsers/parse.py 83.33% 2 Missing :warning:
tdrs-backend/tdpservice/scheduling/parser_task.py 33.33% 2 Missing :warning:
tdrs-backend/tdpservice/data_files/admin/admin.py 90.00% 1 Missing :warning:
Additional details and impacted files [![Impacted file tree graph](https://app.codecov.io/gh/raft-tech/TANF-app/pull/3126/graphs/tree.svg?width=650&height=150&src=pr&token=BA04YXPAL9&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=raft-tech)](https://app.codecov.io/gh/raft-tech/TANF-app/pull/3126?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=raft-tech) ```diff @@ Coverage Diff @@ ## develop #3126 +/- ## =========================================== - Coverage 91.65% 91.07% -0.59% =========================================== Files 278 284 +6 Lines 7623 7766 +143 Branches 697 711 +14 =========================================== + Hits 6987 7073 +86 - Misses 532 587 +55 - Partials 104 106 +2 ``` | [Flag](https://app.codecov.io/gh/raft-tech/TANF-app/pull/3126/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=raft-tech) | Coverage Δ | | |---|---|---| | [dev-backend](https://app.codecov.io/gh/raft-tech/TANF-app/pull/3126/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=raft-tech) | `90.84% <60.50%> (-0.67%)` | :arrow_down: | | [dev-frontend](https://app.codecov.io/gh/raft-tech/TANF-app/pull/3126/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=raft-tech) | `92.60% <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=raft-tech#carryforward-flags-in-the-pull-request-comment) to find out more. | [Files](https://app.codecov.io/gh/raft-tech/TANF-app/pull/3126?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=raft-tech) | Coverage Δ | | |---|---|---| | [...ata\_files/migrations/0013\_datafile\_reparse\_meta.py](https://app.codecov.io/gh/raft-tech/TANF-app/pull/3126?src=pr&el=tree&filepath=tdrs-backend%2Ftdpservice%2Fdata_files%2Fmigrations%2F0013_datafile_reparse_meta.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=raft-tech#diff-dGRycy1iYWNrZW5kL3RkcHNlcnZpY2UvZGF0YV9maWxlcy9taWdyYXRpb25zLzAwMTNfZGF0YWZpbGVfcmVwYXJzZV9tZXRhLnB5) | `100.00% <100.00%> (ø)` | | | [tdrs-backend/tdpservice/data\_files/models.py](https://app.codecov.io/gh/raft-tech/TANF-app/pull/3126?src=pr&el=tree&filepath=tdrs-backend%2Ftdpservice%2Fdata_files%2Fmodels.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=raft-tech#diff-dGRycy1iYWNrZW5kL3RkcHNlcnZpY2UvZGF0YV9maWxlcy9tb2RlbHMucHk=) | `79.72% <100.00%> (+0.13%)` | :arrow_up: | | [...ackend/tdpservice/search\_indexes/admin/\_\_init\_\_.py](https://app.codecov.io/gh/raft-tech/TANF-app/pull/3126?src=pr&el=tree&filepath=tdrs-backend%2Ftdpservice%2Fsearch_indexes%2Fadmin%2F__init__.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=raft-tech#diff-dGRycy1iYWNrZW5kL3RkcHNlcnZpY2Uvc2VhcmNoX2luZGV4ZXMvYWRtaW4vX19pbml0X18ucHk=) | `100.00% <100.00%> (ø)` | | | [...nd/tdpservice/search\_indexes/admin/reparse\_meta.py](https://app.codecov.io/gh/raft-tech/TANF-app/pull/3126?src=pr&el=tree&filepath=tdrs-backend%2Ftdpservice%2Fsearch_indexes%2Fadmin%2Freparse_meta.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=raft-tech#diff-dGRycy1iYWNrZW5kL3RkcHNlcnZpY2Uvc2VhcmNoX2luZGV4ZXMvYWRtaW4vcmVwYXJzZV9tZXRhLnB5) | `100.00% <100.00%> (ø)` | | | [...arch\_indexes/migrations/0030\_reparse\_meta\_model.py](https://app.codecov.io/gh/raft-tech/TANF-app/pull/3126?src=pr&el=tree&filepath=tdrs-backend%2Ftdpservice%2Fsearch_indexes%2Fmigrations%2F0030_reparse_meta_model.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=raft-tech#diff-dGRycy1iYWNrZW5kL3RkcHNlcnZpY2Uvc2VhcmNoX2luZGV4ZXMvbWlncmF0aW9ucy8wMDMwX3JlcGFyc2VfbWV0YV9tb2RlbC5weQ==) | `100.00% <100.00%> (ø)` | | | [...ckend/tdpservice/search\_indexes/models/\_\_init\_\_.py](https://app.codecov.io/gh/raft-tech/TANF-app/pull/3126?src=pr&el=tree&filepath=tdrs-backend%2Ftdpservice%2Fsearch_indexes%2Fmodels%2F__init__.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=raft-tech#diff-dGRycy1iYWNrZW5kL3RkcHNlcnZpY2Uvc2VhcmNoX2luZGV4ZXMvbW9kZWxzL19faW5pdF9fLnB5) | `100.00% <100.00%> (ø)` | | | [tdrs-backend/tdpservice/settings/common.py](https://app.codecov.io/gh/raft-tech/TANF-app/pull/3126?src=pr&el=tree&filepath=tdrs-backend%2Ftdpservice%2Fsettings%2Fcommon.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=raft-tech#diff-dGRycy1iYWNrZW5kL3RkcHNlcnZpY2Uvc2V0dGluZ3MvY29tbW9uLnB5) | `99.31% <100.00%> (+<0.01%)` | :arrow_up: | | [tdrs-backend/tdpservice/data\_files/admin/admin.py](https://app.codecov.io/gh/raft-tech/TANF-app/pull/3126?src=pr&el=tree&filepath=tdrs-backend%2Ftdpservice%2Fdata_files%2Fadmin%2Fadmin.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=raft-tech#diff-dGRycy1iYWNrZW5kL3RkcHNlcnZpY2UvZGF0YV9maWxlcy9hZG1pbi9hZG1pbi5weQ==) | `91.42% <90.00%> (ø)` | | | [tdrs-backend/tdpservice/parsers/parse.py](https://app.codecov.io/gh/raft-tech/TANF-app/pull/3126?src=pr&el=tree&filepath=tdrs-backend%2Ftdpservice%2Fparsers%2Fparse.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=raft-tech#diff-dGRycy1iYWNrZW5kL3RkcHNlcnZpY2UvcGFyc2Vycy9wYXJzZS5weQ==) | `83.66% <83.33%> (-0.02%)` | :arrow_down: | | [tdrs-backend/tdpservice/scheduling/parser\_task.py](https://app.codecov.io/gh/raft-tech/TANF-app/pull/3126?src=pr&el=tree&filepath=tdrs-backend%2Ftdpservice%2Fscheduling%2Fparser_task.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=raft-tech#diff-dGRycy1iYWNrZW5kL3RkcHNlcnZpY2Uvc2NoZWR1bGluZy9wYXJzZXJfdGFzay5weQ==) | `39.02% <33.33%> (-0.45%)` | :arrow_down: | | ... and [3 more](https://app.codecov.io/gh/raft-tech/TANF-app/pull/3126?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=raft-tech) | | ------ [Continue to review full report in Codecov by Sentry](https://app.codecov.io/gh/raft-tech/TANF-app/pull/3126?dropdown=coverage&src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=raft-tech). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=raft-tech) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://app.codecov.io/gh/raft-tech/TANF-app/pull/3126?dropdown=coverage&src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=raft-tech). Last update [70cd922...2226a3e](https://app.codecov.io/gh/raft-tech/TANF-app/pull/3126?dropdown=coverage&src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=raft-tech). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=raft-tech).
jtimpe commented 3 months ago

I get a duplicate key error and lose all record data when trying to replicate the "double button click" (sequential execution) for which you removed the handling. just want to highlight the risk there

[2024-08-09 14:22:13,973: ERROR/ForkPoolWorker-12] Encountered Database exception in parser_task.py: 
web-1           | duplicate key value violates unique constraint "parsers_datafilesummary_datafile_id_880a2f4d_uniq"
web-1           | DETAIL:  Key (datafile_id)=(4) already exists.
image
raftmsohani commented 3 months ago

@elipe17 I am still not clear why we need many to many relationship, and I would like to avoid them if possible. There are many things that can go wrong with them, and leave junk data in the DB. Many if you could elaborate more on why many-to-many is needed I can be convinced!

andrew-jameson commented 3 months ago

Both locally and on a11y, I am not getting a finished reparse run. I initially just used -a to gather the dozen or so files i had uploaded then tried breaking it up by year, same results. I waited for logs to indicate no more parsing was happening before initiating the next run. Used task clean prior to building so its completely fresh. a11y also has a fresh DB due to some issues. Will retry against raft env.

Screenshots ![Screenshot 2024-08-15 at 11 44 34 AM](https://github.com/user-attachments/assets/11b3ad8f-cafc-4948-bb47-9eb7e0ce83e8) ![Screenshot 2024-08-15 at 11 56 28 AM](https://github.com/user-attachments/assets/2abdba46-0516-42fe-8b63-0c96f8c1fe22) ![Screenshot 2024-08-15 at 11 56 43 AM](https://github.com/user-attachments/assets/ce449fa2-6736-4896-8626-c99af8633eb2)
2024-08-15 11:55:21 [2024-08-15 15:55:21,346: INFO/ForkPoolWorker-52] DataFile parsing started for file ADS.E2J.NDM1.TS01
2024-08-15 11:55:21 2024-08-15 15:55:21,426 DEBUG fields.py::parse_value:L47 :  Field: 'tribe_code' at position: [14, 17) is empty.
2024-08-15 11:55:21 2024-08-15 15:55:21,426 DEBUG fields.py::parse_value:L47 :  Field: 'tribe_code' at position: [14, 17) is empty.
2024-08-15 11:55:21 2024-08-15 15:55:21,426 DEBUG parse.py::parse_datafile:L46 :  Datafile has encrypted fields: True.
2024-08-15 11:55:21 2024-08-15 15:55:21,426 DEBUG parse.py::parse_datafile:L47 :  Datafile: {id: 21, filename: ADS.E2J.FTP1.TS06, STT: Alabama (01), S3 location: data_files/2023/Q1/1/Active Case Data/ADS.E2J.FTP1.TS06}, is Tribal: False.
2024-08-15 11:55:21 2024-08-15 15:55:21,426 DEBUG parse.py::parse_datafile:L51 :  Program type: TAN, Section: A.
2024-08-15 11:55:21 2024-08-15 15:55:21,427 INFO parse.py::parse_datafile:L95 :  Preparser Error -> Rpt Month Year is not valid: Submitted reporting year:2020, quarter:Q4 doesn't match file reporting year:2023, quarter:Q1.
2024-08-15 11:55:21 2024-08-15 15:55:21,427 DEBUG parse.py::bulk_create_errors:L155 :  Bulk creating ParserErrors.
2024-08-15 11:55:21 2024-08-15 15:55:21,429 INFO parse.py::bulk_create_errors:L158 :  Created 1/1 ParserErrors.
2024-08-15 11:55:21 2024-08-15 15:55:21,439 INFO parser_task.py::parse:L41 :  Parsing finished for file -> {id: 21, filename: ADS.E2J.FTP1.TS06, STT: Alabama (01), S3 location: data_files/2023/Q1/1/Active Case Data/ADS.E2J.FTP1.TS06} with status Rejected and 1 errors.
2024-08-15 11:55:21 [2024-08-15 15:55:21,439: INFO/ForkPoolWorker-52] Parsing finished for file -> {id: 21, filename: ADS.E2J.FTP1.TS06, STT: Alabama (01), S3 location: data_files/2023/Q1/1/Active Case Data/ADS.E2J.FTP1.TS06} with status Rejected and 1 errors.
2024-08-15 11:55:21 [2024-08-15 15:55:21,441: INFO/ForkPoolWorker-52] Task tdpservice.scheduling.parser_task.parse[22642997-1f01-4f61-bc74-0cef780d0247] succeeded in 0.10376212500000292s: None
elipe17 commented 3 months ago

Both locally and on a11y, I am not getting a finished reparse run. I initially just used -a to gather the dozen or so files i had uploaded then tried breaking it up by year, same results. I waited for logs to indicate no more parsing was happening before initiating the next run. Used task clean prior to building so its completely fresh. a11y also has a fresh DB due to some issues. Will retry against raft env.

Screenshots

2024-08-15 11:55:21 [2024-08-15 15:55:21,346: INFO/ForkPoolWorker-52] DataFile parsing started for file ADS.E2J.NDM1.TS01
2024-08-15 11:55:21 2024-08-15 15:55:21,426 DEBUG fields.py::parse_value:L47 :  Field: 'tribe_code' at position: [14, 17) is empty.
2024-08-15 11:55:21 2024-08-15 15:55:21,426 DEBUG fields.py::parse_value:L47 :  Field: 'tribe_code' at position: [14, 17) is empty.
2024-08-15 11:55:21 2024-08-15 15:55:21,426 DEBUG parse.py::parse_datafile:L46 :  Datafile has encrypted fields: True.
2024-08-15 11:55:21 2024-08-15 15:55:21,426 DEBUG parse.py::parse_datafile:L47 :  Datafile: {id: 21, filename: ADS.E2J.FTP1.TS06, STT: Alabama (01), S3 location: data_files/2023/Q1/1/Active Case Data/ADS.E2J.FTP1.TS06}, is Tribal: False.
2024-08-15 11:55:21 2024-08-15 15:55:21,426 DEBUG parse.py::parse_datafile:L51 :  Program type: TAN, Section: A.
2024-08-15 11:55:21 2024-08-15 15:55:21,427 INFO parse.py::parse_datafile:L95 :  Preparser Error -> Rpt Month Year is not valid: Submitted reporting year:2020, quarter:Q4 doesn't match file reporting year:2023, quarter:Q1.
2024-08-15 11:55:21 2024-08-15 15:55:21,427 DEBUG parse.py::bulk_create_errors:L155 :  Bulk creating ParserErrors.
2024-08-15 11:55:21 2024-08-15 15:55:21,429 INFO parse.py::bulk_create_errors:L158 :  Created 1/1 ParserErrors.
2024-08-15 11:55:21 2024-08-15 15:55:21,439 INFO parser_task.py::parse:L41 :  Parsing finished for file -> {id: 21, filename: ADS.E2J.FTP1.TS06, STT: Alabama (01), S3 location: data_files/2023/Q1/1/Active Case Data/ADS.E2J.FTP1.TS06} with status Rejected and 1 errors.
2024-08-15 11:55:21 [2024-08-15 15:55:21,439: INFO/ForkPoolWorker-52] Parsing finished for file -> {id: 21, filename: ADS.E2J.FTP1.TS06, STT: Alabama (01), S3 location: data_files/2023/Q1/1/Active Case Data/ADS.E2J.FTP1.TS06} with status Rejected and 1 errors.
2024-08-15 11:55:21 [2024-08-15 15:55:21,441: INFO/ForkPoolWorker-52] Task tdpservice.scheduling.parser_task.parse[22642997-1f01-4f61-bc74-0cef780d0247] succeeded in 0.10376212500000292s: None

@andrew-jameson the code that handles tracking failed files (think S3 exception we don't catch) or files that exit parsing early due to cat1 errors is in the follow on PR since it is required for sequential execution and not general metadata tracking.

andrew-jameson commented 3 months ago

Usability change for sysadmins and developers: Data Files page filter by a ReparseMeta model object.

for my own notes:

meta7 = ReparseMeta.objects.get(id=7)
datafiles = DataFile.objects.filter(reparse_meta_models=meta7)
# equivalent is ~ meta7.reparse_meta_models.all()
[print("{}:{}".format(d.stt,d.fiscal_year)) for d in datafiles]

## files associated with 
Arkansas (05):2024 - Q1 (Oct - Dec)
Arkansas (05):2024 - Q1 (Oct - Dec)
Alabama (01):2024 - Q1 (Oct - Dec)
Chippewa-Cree Indians of the Rocky Boy's Reservation (043):2024 - Q1 (Oct - Dec)
Chippewa-Cree Indians of the Rocky Boy's Reservation (043):2024 - Q1 (Oct - Dec)
Chippewa-Cree Indians of the Rocky Boy's Reservation (043):2024 - Q1 (Oct - Dec)
Chippewa-Cree Indians of the Rocky Boy's Reservation (043):2024 - Q1 (Oct - Dec)
Florida (12):2024 - Q1 (Oct - Dec)
Florida (12):2024 - Q1 (Oct - Dec)
Alabama (01):2024 - Q1 (Oct - Dec)
Arkansas (05):2024 - Q1 (Oct - Dec)
Alabama (01):2024 - Q1 (Oct - Dec)
ADPennington commented 3 months ago

per standup today #3064 and #3065 work reflected in this PR. I started testing this morning.

ADPennington commented 3 months ago

@elipe17 @andrew-jameson @jtimpe @raftmsohani I'm currently blocked on testing this PR in qasp environment. I attempted to reparse this morning for FY2023 Q1 and the operation was killed after the backup was completed. evidence below ⬇️

2024-08-24 13:54:44,539 INFO clean_and_reparse.py::__backup:L49 :  Backup complete! Commencing clean and reparse.
Backup complete! Commencing clean and reparse.
Killed

Screenshot 2024-08-24 100459

I then tried another quarter: FY2023Q2 and couldn't proceed:

vcap@fc474368-25ed-4bfd-51b7-c201:~$ python manage.py clean_and_reparse -y 2023 -q Q2

You have selected to reparse datafiles for FY 2023 and Q2. The reparsed files will NOT be stored in new indices and the old indices
These options will delete and reparse (20) datafiles.
Continue [y/n]? y
The latest ReparseMeta model's (ID: 2) timeout_at field is None. Cannot safely execute reparse, please fix manually.

Worth noting that FY23Q1 has a couple of large files that should generate a lot of errors, so I'd like to see how this operation performs before this in prod.

elipe17 commented 3 months ago

@elipe17 @andrew-jameson @jtimpe @raftmsohani I'm currently blocked on testing this PR in qasp environment. I attempted to reparse this morning for FY2023 Q1 and the operation was killed after the backup was completed. evidence below ⬇️

2024-08-24 13:54:44,539 INFO clean_and_reparse.py::__backup:L49 :  Backup complete! Commencing clean and reparse.
Backup complete! Commencing clean and reparse.
Killed

Screenshot 2024-08-24 100459

I then tried another quarter: FY2023Q2 and couldn't proceed:

vcap@fc474368-25ed-4bfd-51b7-c201:~$ python manage.py clean_and_reparse -y 2023 -q Q2

You have selected to reparse datafiles for FY 2023 and Q2. The reparsed files will NOT be stored in new indices and the old indices
These options will delete and reparse (20) datafiles.
Continue [y/n]? y
The latest ReparseMeta model's (ID: 2) timeout_at field is None. Cannot safely execute reparse, please fix manually.

Worth noting that FY23Q1 has a couple of large files that should generate a lot of errors, so I'd like to see how this operation performs before this in prod.

@ADPennington I updated the meta model in qasp so that you can continue testing. The Killed console output indicates to me that the process was killed for some reason. I can't go far enough back in the logs to see if I can see exactly what happened.

ADPennington commented 3 months ago

@elipe17 latest test notes/questions below ⬇️ I didn't observe anything that needs to be addressed in this ticket; this is mostly for my SA.

elipe17 commented 3 months ago

@elipe17 latest test notes/questions below ⬇️ I didn't observe anything that needs to be addressed in this ticket; this is mostly for my SA.

  • Is there a way to know which source file id(s) are associated with the difference between the deleted/created counts? Looks like after the reparsing, the record count is different, which will sometimes be the case when validation is updated, but I imagine we'd also want to be able to investigate files to check if something went wrong? (see below): Screenshot 2024-08-28 083803
  • what's the difference between total # of records initial and # of records created? Is one capturing the number of records in files vs number of records in the db after reparsing?
  • are we replacing the records in the db or adding new records? After reparsing FY23Q3, I see 6628 TANF T4s, and 3314 "new" TANF T4s. I was kind of expecting to see only "new" TANF T4s == "all" TANF T4s for this fiscal period. I'm assuming this is because more than one version of the FY23Q3 file was subject to reparsing? (see below). If true, this is another good justification for why we want to control which versions get reparsed (i.e. most recent 😄) Screenshot 2024-08-28 083002
  • mentioned this async too, so this is just for reference, would be helpful for admins to know how to "fix manually" when observe logentries like the following: The latest ReparseMeta model's (ID: 2) timeout_at field is None. Cannot safely execute reparse, please fix manually.

@ADPennington, see my responses below :).

ADPennington commented 3 months ago
  • The record count is/can be different for files that have not been cat4 validated. Since records with cat4 errors don't get serialized to the DB we can expect to see the "num records" fields to not always be a one-to-one match since cat4 is relatively new. We can write a spike ticket to investigate the feasibility of tracking before and after record counts for files, this ticket might be a way for us to get that information. As an intermediary, I have also written this ticket which adds some more useful fields. Specifically tracking cat4 errors before and after the reparse will help illuminate if it makes sense that the record counts have diverged.

per async with @elipe17 #3096 is the ticket intended to capture more details about cat1 and cat4 errors in data file summaries. linking just for reference to related ideas 😄