📓 Minimize re-requests by storing data for unsolved and aided puzzles - Githubissues

kesyog / crossword

Scraping personal NYT crossword stats

Apache License 2.0

33 stars 6 forks source link

📓 Minimize re-requests by storing data for unsolved and aided puzzles #3

Closed kesyog closed 3 years ago

kesyog commented 3 years ago

Issues addressed

Data for unsolved puzzles and puzzles where cheats were used were being unnecessarily re-requested on each program run.

Each puzzle has a unique, constant puzzle id on the NYT servers, and the date<->puzzle id mapping for these puzzles was being re-requested on each program iteration.
Data for unsolved puzzles needs to be queried on each program iteration to see if they have been newly solved. Puzzles where cheats were used were treated the same as unsolved even though their status won't change in subsequent runs.

Summary of changes

Store puzzle id and cheat usage status to the database
Load cached puzzle id and cheat usage information from the database and only send requests for new data:
- Puzzle ids for puzzle dates that weren't already in the database or existed without a puzzle id (e.g. because an older version of the database was used)
- Solve stats for dates that either didn't exist in the database or that existed but were unsolved

Tangential changes

Refactor code into smaller, better-organized modules
Reduce over-usage methods when simple helper functions would do
Separate database abstraction a little more cleanly so that it could be more easily be turned into a trait later. This would allow other backing stores besides CSV files.
Some very small amount of unit testing
Print out the total number of requests made when the program is finished running

Results

The total number of requests made for my data after the initial program run went down ~3x 🚀