This issue outlines the task of creating regular snapshots of the Verifier Alliance database and distributing them in Parquet format. The database is currently hosted on Google Cloud SQL.
Goals
Create periodic snapshots of the Verifier Alliance database.
Distribute these snapshots as Parquet files for potential downstream processing and analysis.
Discussion
[ ] Snapshotting Approach: We need to decide on the best approach for snapshotting the Cloud SQL database. Two options are (according to GPT):
Utilize BigQuery's federated queries and export functionality (preview) to export the data to Cloud Storage in Parquet format. (related Stackoverflow question)
Develop a solution using Cloud Functions or Cloud Dataflow to extract data from Cloud SQL, convert it to Parquet format, and write it to Cloud Storage.
[ ] Distribution Needs: Determine the destination for the distributed Parquet files (Cloud Storage etc.).
[ ] Scheduling: Define the scheduling requirements for creating these snapshots (daily, weekly, etc.).
Description
This issue outlines the task of creating regular snapshots of the Verifier Alliance database and distributing them in Parquet format. The database is currently hosted on Google Cloud SQL.
Goals
Discussion