nimbly-dev / nyctripdata_project

Project to learn Data Engineering from: https://github.com/DataTalksClub/data-engineering-zoomcamp
0 stars 0 forks source link

DATAENG-1: Have a successful pipeline run for populate yellow_cab tripdata sets in the Populate Tripdata Pipeline #1

Closed nimbly-dev closed 6 days ago

nimbly-dev commented 1 week ago

Currently, the following parameters below fails. Fix the pipeline for data population of yellow_cab datasets

Endpoint: http://localhost:6789/api/pipeline_schedules/5/pipeline_runs/51372ce952da4ce4bc70cbe37eda0ff2

Parameters:

{ "pipeline_run": { "variables": { "dev_limit_rows" : -1, "end_month": 12, "end_year": 2022, "start_month": 1, "start_year": 2021, "pipeline_run_name": "populate_yellowtripdata_2021_2022", "spark_mode" : "cluster", "tripdata_type": "yellow_cab_tripdata", "data_loss_threshold": "very_strict", "overwrite_enabled" : true } } }

If done, attached the populated lakehouse, stage, and production DB. Together with the successful pipeline build

nimbly-dev commented 6 days ago

MRL https://github.com/nimbly-dev/nyctripdata_project/pull/8

nimbly-dev commented 6 days ago

Merged.