Open tsp2123 opened 6 years ago
I feel like you're in a good place with this data set, I'm interested to see the larger conclusion with whether people are more generous during the holiday season or based on the distance of their trip.
I have updates! Which are really just a bunch new graphs from a dataset. I reoriented by data to look at December. I defined the holiday season as 24 Dec to 31st. The rest I defined as Workday. Here's some graphs and code!
Here's how people tip by borough:
Here's how people tip during the Holiday Season vs the Regular Season:
That's pretty even. And kinda uninteresting.
But let's see who tips best during the holidays?
There's some crazy numbers for maximum tips. Let's look at that.
Let's see what it's like for the holidays and maximum tips?
Let's see how many people tip more than 100 percent during the holidays?
I'll fix these graphs as soon as I have an interesting style sheet. For now, pardon the generic graphs.
Really interesting data and analysis.
Of course you'll need to add titles and other stuff that reader can understand the story the graph is telling just by looking at the graph.
What would be also interesting would be tho compare like the 10% or 25 % biggest tips (since the single on maximum tip can be a bit misleading (but of course interesting as well, but maybe just mentioned in the story, not in the graph)
Also it would be interesting to see the percentages of people who tip eg under 10 %, 10-20%, 20%-30% ...
Great job!
This is a very interesting topic, especially because the taxi business in NYC is threatened by companies such as Uber, Lyft etc. I would narrow down your focus on a specific time of the year (i.e. the holidays) and try to see who keeps this business alive. In other words, who still prefers to hail a taxi rather than taking an Uber, who tips the most etc. Then, I would add some more context to your graph, so that non-New Yorkers can grasp immediately who the business works.
Good work overall.
Howdy! I'm a little robot, checking in on your project.
You need some feedback, let me summon @castorsia, @adrianblanco for you
It looks like we need to fix up your your update a little bit! Edit it by clicking the pencil in the top right-hand corner. It requires:
Maybe you just didn't use the template? If not, edit your comment, cut and paste the template in, and then fill it out.
Hope you like the graphs and look forward to hearing your feedback.
1) Let's start by examining the average tips grouped by pickup borough
What about if I grouped by Drop Off borough?
2) Okay, now let's look at whether average tips were higher or lower during the holidays compared to regular days. (Grouped by Pickup Borough.)
3) Alright, so New Yorkers tip equally no matter whether it's a holiday or not—but, let's check out how New York boroughs tip during the holiday and compare it to the non-holidays.
4) There are some crazy numbers hiding in this data set. Let's look at the maximum percentage tip people have issued.
5) Those are some crazy numbers. Let's look at how many people tipped above 100 percent.
Hi All, Here's my final project. A few notes before I begin. While the data-sets above looked at January. I decided to switch my focus around to look at December 2017. The purpose of this project is now to see whether New Yorkers tip better around the holiday season or worse. I've decided to arbitrarily define the holiday season as December 24 to December 31st.
While I began by looking at January 2017 and my intention was to do this exercise for the entire year's worth of data but that would have been more than 12 Gigs of Data. Instead I decided to look at December 2017. I reoriented this project to look at the holiday season vs non holiday season and how people tip. I arbitrarily decided that December 24 to 31st is the Holiday Season in December. I know I could have started with Hanukah or perhaps the holiday cheer is all December? But problems with running code on larger datasets had me narrowing my scope
Headline:
How do New Yorkers Tip Yellow Cab Drivers?
Code repository:
Final data set(s): http://www.nyc.gov/html/tlc/html/technology/industry_reports.shtml
I encountered several problems. For whatever reason it took nearly an hour for a simple function to run on my dataset causing me to lose a great deal of time. I think this may be larger issues with my computer. Decided how to filter my data was another problem. Much of these graphs have been created by subsetting my initial dataset. So I had a dataset for what I considered "Holidays" and what I considered "Non Holidays" and I ran my summary stats on each of these separately. This allowed me to group by borough but also categorize the season I was looking at. But: It didn't allow me to then graph holidays / non-holidays side by side, which was something I thought I could do in illustrator.
My illustrator chops lack muster and this leads to another problem as you can see the graphs have different artboard sizes as I couldn't figure out how to set a single artboard size for each graph. I'm not too happy with the aesthetic choices of this project
Whoa! You are almost done! As the boroughs are so big, would it be possible to have more detailed information by zipcode or smaller areas? Now that you have the global perspective it would be great to go further in this topic
Very interesting! And pertinent. Maybe putting the $$$ value in the 'biggest tip' chart in the bars in order to communicate the scale of the tip more directly?
Please complete all of the following sections, or the ghost of Joseph Pulitzer will spookily dance around your issue! A completed version of this template can be found at https://github.com/jsoma/data-studio-projects/issues/1
Pitch
I wanna see whether people tip their cab drives differently by each by each borough.
Summary
I gathered taxi cab data from the TLC website for the month of January and ran a bunch of aggregated stats on the stats creating a new column for tip percentages and adding borough names to each fare.
Details
Possible headline(s):
Big Tipper?
Data set(s): http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml
Code repository:
Possible problems/fears/questions: I need to run a few more queries on these. It would be nice to find a way to group tip percentages by fare distance and other ways of organizing the data. The dataset is about 1 gig and is a lot to work with so I'm wondering if there is a way to optimize large datasets without having to get rid of rows.
Work so far
The data set isn't as interesting. People trip pretty decently across the board. But maybe it would be cool to see whether people tip differently during the holiday season compared to regular days?
Here's some graphs:
Checklist
This checklist must be completed before you submit your draft.
[Project]
in the title