jsoma / data-studio-projects

12 stars 18 forks source link

Which borough tips best? Yellow Cab. January Edition. #144

Open tsp2123 opened 6 years ago

tsp2123 commented 6 years ago

Please complete all of the following sections, or the ghost of Joseph Pulitzer will spookily dance around your issue! A completed version of this template can be found at https://github.com/jsoma/data-studio-projects/issues/1

Pitch

I wanna see whether people tip their cab drives differently by each by each borough.

Summary

I gathered taxi cab data from the TLC website for the month of January and ran a bunch of aggregated stats on the stats creating a new column for tip percentages and adding borough names to each fare.

Details

Possible headline(s):

Big Tipper?

Data set(s): http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml

Code repository:

Possible problems/fears/questions: I need to run a few more queries on these. It would be nice to find a way to group tip percentages by fare distance and other ways of organizing the data. The dataset is about 1 gig and is a lot to work with so I'm wondering if there is a way to optimize large datasets without having to get rid of rows.

Work so far

The data set isn't as interesting. People trip pretty decently across the board. But maybe it would be cool to see whether people tip differently during the holiday season compared to regular days?

Here's some graphs:

screen shot 2018-07-11 at 12 27 06 pm

screen shot 2018-07-11 at 12 26 45 pm

Checklist

This checklist must be completed before you submit your draft.

linleysanders commented 6 years ago

I feel like you're in a good place with this data set, I'm interested to see the larger conclusion with whether people are more generous during the holiday season or based on the distance of their trip.

tsp2123 commented 6 years ago

I have updates! Which are really just a bunch new graphs from a dataset. I reoriented by data to look at December. I defined the holiday season as 24 Dec to 31st. The rest I defined as Workday. Here's some graphs and code!

Here's how people tip by borough:

screen shot 2018-07-16 at 11 40 18 am

screen shot 2018-07-16 at 11 40 29 am

Here's how people tip during the Holiday Season vs the Regular Season:

screen shot 2018-07-16 at 11 40 42 am

That's pretty even. And kinda uninteresting.

But let's see who tips best during the holidays?

screen shot 2018-07-16 at 11 41 21 am

There's some crazy numbers for maximum tips. Let's look at that.

screen shot 2018-07-16 at 11 42 08 am

Let's see what it's like for the holidays and maximum tips?

screen shot 2018-07-16 at 11 42 28 am

screen shot 2018-07-16 at 11 42 38 am

Let's see how many people tip more than 100 percent during the holidays?

screen shot 2018-07-16 at 11 43 19 am

I'll fix these graphs as soon as I have an interesting style sheet. For now, pardon the generic graphs.

Palarisk commented 6 years ago

Really interesting data and analysis.

Of course you'll need to add titles and other stuff that reader can understand the story the graph is telling just by looking at the graph.

What would be also interesting would be tho compare like the 10% or 25 % biggest tips (since the single on maximum tip can be a bit misleading (but of course interesting as well, but maybe just mentioned in the story, not in the graph)

Also it would be interesting to see the percentages of people who tip eg under 10 %, 10-20%, 20%-30% ...

Great job!

Katerinavts commented 6 years ago

This is a very interesting topic, especially because the taxi business in NYC is threatened by companies such as Uber, Lyft etc. I would narrow down your focus on a specific time of the year (i.e. the holidays) and try to see who keeps this business alive. In other words, who still prefers to hail a taxi rather than taking an Uber, who tips the most etc. Then, I would add some more context to your graph, so that non-New Yorkers can grasp immediately who the business works.

Good work overall.

jsoma commented 6 years ago
playfairbot commented 6 years ago

Howdy! I'm a little robot, checking in on your project.

You need some feedback, let me summon @castorsia, @adrianblanco for you

It looks like we need to fix up your your update a little bit! Edit it by clicking the pencil in the top right-hand corner. It requires:

Maybe you just didn't use the template? If not, edit your comment, cut and paste the template in, and then fill it out.

tsp2123 commented 6 years ago

Final

Project visuals/text

Hope you like the graphs and look forward to hearing your feedback.

1) Let's start by examining the average tips grouped by pickup borough

december_2017_median_pickup_edited

What about if I grouped by Drop Off borough?

december_2017_median_dropoff_edit

2) Okay, now let's look at whether average tips were higher or lower during the holidays compared to regular days. (Grouped by Pickup Borough.)

december_2017_median_type_of_day_edit

3) Alright, so New Yorkers tip equally no matter whether it's a holiday or not—but, let's check out how New York boroughs tip during the holiday and compare it to the non-holidays.

december_non_holiday_median_pickup_edited

december_holiday_median_pickup_edit

december_non_holiday_median_dropoff_edited

december_holiday_median_dropoff_edit

4) There are some crazy numbers hiding in this data set. Let's look at the maximum percentage tip people have issued.

december_non_holiday_max_pickup_edit

december_holiday_max_pickup_edit

december_non_holiday_max_dropoff_edit

december_holiday_max_dropoff_edit

5) Those are some crazy numbers. Let's look at how many people tipped above 100 percent.

all_above_100_pickup_edit

all_above_100_dropoff_edit

Details

Hi All, Here's my final project. A few notes before I begin. While the data-sets above looked at January. I decided to switch my focus around to look at December 2017. The purpose of this project is now to see whether New Yorkers tip better around the holiday season or worse. I've decided to arbitrarily define the holiday season as December 24 to December 31st.

While I began by looking at January 2017 and my intention was to do this exercise for the entire year's worth of data but that would have been more than 12 Gigs of Data. Instead I decided to look at December 2017. I reoriented this project to look at the holiday season vs non holiday season and how people tip. I arbitrarily decided that December 24 to 31st is the Holiday Season in December. I know I could have started with Hanukah or perhaps the holiday cheer is all December? But problems with running code on larger datasets had me narrowing my scope

Headline:

How do New Yorkers Tip Yellow Cab Drivers?

Code repository:

Final data set(s): http://www.nyc.gov/html/tlc/html/technology/industry_reports.shtml

What did you find to be the most difficult part of this project?

I encountered several problems. For whatever reason it took nearly an hour for a simple function to run on my dataset causing me to lose a great deal of time. I think this may be larger issues with my computer. Decided how to filter my data was another problem. Much of these graphs have been created by subsetting my initial dataset. So I had a dataset for what I considered "Holidays" and what I considered "Non Holidays" and I ran my summary stats on each of these separately. This allowed me to group by borough but also categorize the season I was looking at. But: It didn't allow me to then graph holidays / non-holidays side by side, which was something I thought I could do in illustrator.

Are you satisfied with what you produced? Is there anything you would like to change or improve?

My illustrator chops lack muster and this leads to another problem as you can see the graphs have different artboard sizes as I couldn't figure out how to set a single artboard size for each graph. I'm not too happy with the aesthetic choices of this project

Checklist

adrianblanco commented 6 years ago

Whoa! You are almost done! As the boroughs are so big, would it be possible to have more detailed information by zipcode or smaller areas? Now that you have the global perspective it would be great to go further in this topic

castorsia commented 6 years ago

Very interesting! And pertinent. Maybe putting the $$$ value in the 'biggest tip' chart in the bars in order to communicate the scale of the tip more directly?