raft-tech / TANF-app

Repo for development of a new TANF Data Reporting System
Other
17 stars 4 forks source link

S3 buckets contain fewer datafiles than DAC #3147

Closed elipe17 closed 3 months ago

elipe17 commented 3 months ago

Thank you for taking the time to let us know about the issue you found. The basic rule for bug reporting is that something isn't working the way one would expect it to work. Please provide us with the information requested below and we will look at it as soon as we are able.

Description

The number of files listed in the DAC does not match the number of files listed in S3. For large chunks of datafiles in the admin console the download link does not work. S3 returns an "invalid key" error indicating that the path the DAC has saved for the datafile is invalid or that the file no longer exists in S3. Based on some preliminary testing it seems to be the latter and the file is no longer in S3.

Action Taken

What I expected to see

What I did see

Other Helpful Information

The files below are lists of files that S3 is reporting it is tracking for each environment. The last line in each file indicates what the DAC for that environment thinks the number of files is. prod_files.txt staging_files.txt develop_files.txt

reitermb commented 3 months ago

Still attempting to reproduce, pending office hours

raftmsohani commented 3 months ago

During office hours on Fri 16, we did the following investigation:

  1. Query on S3 staging bucket showed files that were missing version_id. In one instance, file "test_staging.txt" was repeated 80 times both with and without version_id.
  2. query from the backend shell on staging showed same files (number of files were less than what existed in the bucket(s))

ACTIONS:

  1. write script to pull all file version from the bucket using AWS CLI.
  2. write python script to query all DataFiles, including version_id, original_filename
  3. compare two list and find the discrepancies

NOTE:

raftmsohani commented 3 months ago

scripts:

# extracts S3 creds. First login to target env
export SERVICE_INSTANCE_NAME=tdp-datafiles-{environment}  # environment=prod, staging, dev
export KEY_NAME=mo-dev-df

#cf create-service-key "${SERVICE_INSTANCE_NAME}" "${KEY_NAME}"
export S3_CREDENTIALS=$(cf service-key "${SERVICE_INSTANCE_NAME}" "${KEY_NAME}" | tail -n +2)

export AWS_ACCESS_KEY_ID=$(echo "${S3_CREDENTIALS}" | jq -r '.access_key_id')
export AWS_SECRET_ACCESS_KEY=$(echo "${S3_CREDENTIALS}" | jq -r '.secret_access_key')
export BUCKET_NAME=$(echo "${S3_CREDENTIALS}" | jq -r '.bucket')
export AWS_DEFAULT_REGION=$(echo "${S3_CREDENTIALS}" | jq -r '.region')

# s3 command examples

aws s3api list-object-versions --bucket {bucket_name} > objects_versions_staging.txt
aws s3api list-objects --bucket {bucket_name} --query 'Contents[].{Key: Key}
aws s3api get-object --bucket  {bucket_name}  --key tdp-backend-prod/data_files/2023/Q1/37/Aggregate Data/Section_3_Q1FFY2023_text.txt
jtimpe commented 3 months ago

Tested out in develop, staging, prod

For these reasons, we don't have any way to identify the problem or solution. We can recommend OFA introduce redundancy in file uploads (store the flie separately in another location, re-enable titan sftp upload, copy of s3, etc)

robgendron commented 3 months ago

Not pursuing redundancies, spinning up new tickets to pivot on work. Given new direction - team has deemed this ticket able to be closed (8.23 - cross product sync) @ADPennington