NYC.gov has changed all their files to Parquet. The csv files are no longer available through the provided S3 links.
The new link is https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2021-01.parquet
But it requires some additional processing to follow a long. This mostly applies to video DE Zoomcamp 1.2.2 - Ingesting NY Taxi Data to Postgres, but it may pop up in other places throughout the course.
First
pip install pyarrow
Then convert the parquet to pandas:
import pyarrow.parquet as pq
trips = pq.read_table('yellow_tripdata_2021-01.parquet')
df = trips.to_pandas()
Finally, run this command and wait. It will take awhile then return a number when it is finished.
df.to_sql(name='yellow_taxi_data', con=engine, if_exists='replace', chunksize=100000)
Alternatively, the .csv files could be added to the repo with links to those instead.
NYC.gov has changed all their files to Parquet. The csv files are no longer available through the provided S3 links. The new link is https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2021-01.parquet But it requires some additional processing to follow a long. This mostly applies to video DE Zoomcamp 1.2.2 - Ingesting NY Taxi Data to Postgres, but it may pop up in other places throughout the course.
First
pip install pyarrow
Then convert the parquet to pandas:
Finally, run this command and wait. It will take awhile then return a number when it is finished.
df.to_sql(name='yellow_taxi_data', con=engine, if_exists='replace', chunksize=100000)
Alternatively, the .csv files could be added to the repo with links to those instead.