Description:Provide a brief background and justification for this issue
Some STTs submit files that have 200K+ errors, which take a lot of time to process (and often get stuck in a "pending" state*). In scenarios where the processing is completed, the error report generated is not able to be downloaded.
A review of the logs when clicking download shows something like the following:
2024-06-06T15:09:58.84-0400 [APP/PROC/WEB/0] ERR [2024-06-06 19:09:58 +0000] [14] [WARNING] Worker with pid 174 was terminated due to signal 9
2024-06-06T15:09:58.85-0400 [APP/PROC/WEB/0] ERR [2024-06-06 19:09:58 +0000] [175] [INFO] Booting worker with pid: 175
Relatedly, OFA staff cannot feasibly support STTs who inquire about this because:
in some cases the processing status captured in DAC is inconsistent (supporting documentation bullet # 2)
accessing file-specific parsing errors from DAC takes a lot of time (when there are a lot of errors) and parsing error table is not filterable or exportable.
DIGIT team and sys admin groups do not have access to frontend error reports to attempt to reproduce the problem described.
Acceptance Criteria:Create a list of functional outcomes that must be achieved to complete this issue
[ ] error reports can be accessed by STT users regardless of parser error count
[ ] Testing Checklist has been run and all tests pass
[ ] README is updated, if necessary
Tasks:Create a list of granular, specific work items that must be completed to deliver the desired outcomes of this issue
[ ] identify root cause of worker termination
[ ] resolve issue with access to error reports with lots of errors
[ ] Run Testing Checklist and confirm all tests pass
Notes:Add additional useful information, such as related issues and functionality that isn't covered by this specific issue, and other considerations that will be helpful for anyone reading this
*in reviewing the logs during processing, I've observed two types of elastic-related errors when the file is stuck: elasticsearch.exceptions.ConnectionTimeout and elasticsearch.exceptions.TransportError: TransportError(429, '429 Too Many Requests /_bulk')
Supporting Documentation:Please include any relevant log snippets/files/screen shots
Open Questions:Please include any questions or decisions that must be made before beginning work or to confidently call this issue complete
is this a memory-related issue?
the legacy system (TDRS) logic was setup to stop processing after 500 errors were detected. This wasn't great because STTs wouldn't know what kinds of errors were detected. Should we consider a similar approach, where we stop processing after a certain point, but inform where in the file, processing stopped?
should system check for cat 3 errors if cat 2 error detected?
separate ticket needed to address easier access to parsing errors for OFA staff users (digit, sys admin, ofa admin)?
Description: Provide a brief background and justification for this issue
Some STTs submit files that have 200K+ errors, which take a lot of time to process (and often get stuck in a "pending" state*). In scenarios where the processing is completed, the error report generated is not able to be downloaded.
A review of the logs when clicking download shows something like the following:
Relatedly, OFA staff cannot feasibly support STTs who inquire about this because:
Acceptance Criteria: Create a list of functional outcomes that must be achieved to complete this issue
Tasks: Create a list of granular, specific work items that must be completed to deliver the desired outcomes of this issue
Notes: Add additional useful information, such as related issues and functionality that isn't covered by this specific issue, and other considerations that will be helpful for anyone reading this
elasticsearch.exceptions.ConnectionTimeout
andelasticsearch.exceptions.TransportError: TransportError(429, '429 Too Many Requests /_bulk')
Supporting Documentation: Please include any relevant log snippets/files/screen shots
test file: SSP_active_anon.txt
Open Questions: Please include any questions or decisions that must be made before beginning work or to confidently call this issue complete