nimbly-dev / nyctripdata_project

Project to learn Data Engineering from: https://github.com/DataTalksClub/data-engineering-zoomcamp
0 stars 0 forks source link

DATAENG-3: Have an option where there is a python script that can automatically populate the trip datasets on Production #3

Open nimbly-dev opened 1 week ago

nimbly-dev commented 1 week ago

Create a Pyspark python scripts for the following:

  1. yellow_cab_tripdata
  2. green_cab_tripdata
  3. fhv_tripdata

Features:

  1. Accepts year,month range parameters
  2. Can be submitted to the spark standalone cluster

Planned workflow:

nimbly-dev commented 6 days ago

This is just an alternative way on populating tripdata production for Dev purpose only. It's recommended to run the Tripdata Pipeline.