Closed andrewc-moj closed 6 months ago
Hey Thomas - appreciate this isn't your area anymore but would appreciate some feedback on this PR as it stands. It's not quite ready for merging as the main script is looking to update the 'app database'.
Running the script
You can run the amended 'main' script like this: python python_scripts/main.py -e dev --scrape_date yyyy-mm-dd
with no trouble at all.
This will write the (modified) parquet files to a new path db/dev
which a new database matrix_db_dev
will pick up.
Defining the new database
The new database is defined in python_scripts/database_builder_v2.py
.
My plan was to modify this script to take the dev/prod as an argument and set up the database accordingly.
Changes made I've commented against all changes so you know why I'm doing them.
Specific areas for feedback There are things I'd like your view on:
mojap_metadata
so that I can remove the dependency on dataengineeringutils
(now archived) - i.e. how can I impose datatypes?Any other feedback would be welcome around conventions you'd expect people to follow in data engineerring
Laurence requested to see all booking types from Matrix. Test notebook to view results from API call