As tech lead, I need TDP to detect duplicate records within a file and not store them in the db.

ADPennington commented 10 months ago

Description:

TDP should detect the following types of duplicate records, not store them in the db, and return an error message in the feedback report:

Exact duplicate record: this is defined as multiple non-header/non-trailer records in a file that are exactly the same
Partial duplicate record: this is defined as multiple non-header/non-trailer records that share the same values for some key data elements. (this definition will vary by record type)

Logic with examples by section and record type can be found here

- In section 1 files, exact and partial duplicate records are detected, not stored in db, and error message included in the feedback report.
[ ] In section 2 files, exact and partial duplicate records are detected, not stored in db, and error message included in the feedback report.
[ ] section 3 files, partial duplicate T6 records are detected, not stored in db, and error message included in the feedback report.
[ ] section 4 files, partial duplicate T7 records are detected, not stored in db, and error message included in the feedback report.
[ ] duplicate detection logic is applied consistently to TANF, SSP, and Tribal TANF files.
[ ] Testing Checklist has been run and all tests pass
[ ] README is updated, if necessary

Acceptance Criteria: Create a list of functional outcomes that must be achieved to complete this issue

[ ] Spike - investigate an approach, decide how to split remaining work into multiple tickets

Tasks: Create a list of granular, specific work items that must be completed to deliver the desired outcomes of this issue

[ ] Data structure to store all records (rpt_month_year/case_number) during parsing
- potential mem risk
- possibly store hash record_type/rpt_month_year/case_number
[ ] Check against DS for duplicates
- possible second task to revert any unwanted records
[ ] include performance testing (?)
[ ] Run Testing Checklist and confirm all tests pass

Notes: Add additional useful information, such as related issues and functionality that isn't covered by this specific issue, and other considerations that will be helpful for anyone reading this

relevant to TANF, Tribal TANF, and SSP section 1 and 2 files
May consider a separate cache class, similar to cat4, but that acts before records are stored/added to the bulk_create obj

Supporting Documentation: Please include any relevant log snippets/files/screen shots

Doc 1
Doc 2

Open Questions: Please include any questions or decisions that must be made before beginning work or to confidently call this issue complete

Open Question 1
Open Question 2

robgendron commented 6 months ago

This relates to #2948.

elipe17 commented 6 months ago

Removed "Spike" from the title since the research turned into an implementation. Also assigned points.

robgendron commented 5 months ago

3 points remaining.

raft-tech / TANF-app

As tech lead, I need TDP to detect duplicate records within a file and not store them in the db. #2795