[ ] A Cloud Function named extract-opa-properties to fetch the OPA Properties dataset and upload into a Cloud Storage bucket named musa509s24_${team}_raw_data into a folder named opa_properties/
[ ] A Cloud Function named prepare-opa-properties to prepare the file in gs://musa509s24_${team}_raw_data/opa_properties/ for backing an external table. The new file should be stored in JSON-L format in a bucket named musa509s24_${team}_prepared_data and a file named opa_properties/data.jsonl. All field names should be lowercased.
[ ] A Cloud Function named load-opa-properties that creates or updates an external table named source.opa_properties with the fields in gs://musa509s24_${team}_prepared_data/opa_properties/data.jsonl, and creates or updates an internal table named core.opa_properties that contains all the fields from source.opa_properties in addition to a new field named property_id set equal to the value of parcel_number.
[ ] A Workflow named data-pipeline to run each function in step.
All ingest processes in this project will follow this general pattern:
musa509s24_${team}_raw_data
bucketmusa509s24_${team}_prepared_data
bucketsource
dataset based on the data ings://musa509s24_${team}_prepared_data
core
dataset that has at least one additional column added namedproperty_id
. E.g.:Your SQL commands should each be stored in their own files (e.g.
source_phl_opa_properties.sql
andcore_phl_opa_properties.sql
), but should be run from a Cloud Function as part of your pipeline. For an example, see therun_sql
Cloud Function code at https://github.com/musa-5090-spring-2024/course-info/tree/main/week08/explore_phila_data/run_sqlAcceptance Criteria:
extract-opa-properties
to fetch the OPA Properties dataset and upload into a Cloud Storage bucket namedmusa509s24_${team}_raw_data
into a folder namedopa_properties/
prepare-opa-properties
to prepare the file ings://musa509s24_${team}_raw_data/opa_properties/
for backing an external table. The new file should be stored in JSON-L format in a bucket namedmusa509s24_${team}_prepared_data
and a file namedopa_properties/data.jsonl
. All field names should be lowercased.load-opa-properties
that creates or updates an external table namedsource.opa_properties
with the fields ings://musa509s24_${team}_prepared_data/opa_properties/data.jsonl
, and creates or updates an internal table namedcore.opa_properties
that contains all the fields fromsource.opa_properties
in addition to a new field namedproperty_id
set equal to the value ofparcel_number
.data-pipeline
to run each function in step.