Use Cloud HQ to forward all emails in the folder to another gmail account. A Third party gmail forwarding tool is used because google doesn't allow bulk forwards with forward bodies, just forwarded email as attachment.
Change the env var PRE_LOAD_MBOX=real_data/backfill-202405.mbox
docker compose -f docker-compose.dev.yml up --build
python3 -m sage and verify the emails can be parsed
Download a CSV of all of an account's transaction history validator/real_data/Huntington_checking_220601-240527.csv
Validate the DB data against the CSV data python3 -m validator Huntington_checking_220601-240527.csv 5-2024
Production Backfill
Use Cloud HQ to forward all emails in the folder to the mx.
SSH to the host, activate the venv, and run python3 -m sage and verify all 47 emails were processed.
A detour ...
Taking a detour to do the issue Test email parsing exceptions and errors #56 to help with working on the backfilled email errors.
Another detour...
Taking another detour to do the issue Create a data validator module #143 to help with validating the backfills. There's no point in loading hot garbage into sage.
DONE WHEN
Emails for the month parse ✔
12. 2024-05-24 22:35:58.965 | INFO | __main__:main:98 - Total Messages in Batch = 47
2024-05-24 22:35:58.966 | INFO | __main__:main:99 - {'retrieved': 47, 'unparsed': 2, 'processed': 45}
2024-05-24 22:35:58.967 | INFO | __main__:main:100 - DONE
Transactions for the month are validated ✔
(.venv) kfike@cutie:~/Projects/sage$ export ISDEV=True && python3 -m validator Huntington_checking_220601-240527.csv 5-2024
2024-06-18 09:48:45.250 | INFO | __main__:main:36 - STARTING VALIDATION
2024-06-18 09:48:45.253 | INFO | __main__:main:41 - Getting data from validation CSV file: Huntington_checking_220601-240527.csv
2024-06-18 09:48:45.256 | INFO | __main__:main:48 - Getting DB transaction data from 2024-05-01 to 2024-05-30.
2024-06-18 09:48:45.278 | INFO | __main__:diff_csv_and_db_data:208 - Total CSV rows: 10
2024-06-18 09:48:45.278 | INFO | __main__:diff_csv_and_db_data:210 - Total DB records: 10
2024-06-18 09:48:45.279 | INFO | __main__:main:57 - CSV rows not in DB:
2024-06-18 09:48:45.279 | INFO | __main__:main:63 - DB records not in CSV:
Steps
Local Testing
unzip 202405.zip
mv 202405/Takeout/Mail/backfill-202405.mbox /home/kfike/Projects/sage/docker/mailserver/test_data/real_data
PRE_LOAD_MBOX=real_data/backfill-202405.mbox
docker compose -f docker-compose.dev.yml up --build
python3 -m sage
and verify the emails can be parsedvalidator/real_data/Huntington_checking_220601-240527.csv
python3 -m validator Huntington_checking_220601-240527.csv 5-2024
Production Backfill
python3 -m sage
and verify all 47 emails were processed.A detour ...
Taking a detour to do the issue Test email parsing exceptions and errors #56 to help with working on the backfilled email errors.
Another detour...
Taking another detour to do the issue Create a data validator module #143 to help with validating the backfills. There's no point in loading hot garbage into sage.
DONE WHEN