The titanic dataset contains information about passengers on the Titanic, including the amount they paid for their fare and whether or not they survived (this is a subset of the full data available). Let's investigate whether there is an association between the fare that a passenger paid (Fare) and whether or not they survived (Survived, which is equal to 0 is the passenger died and 1 if they survived):
Calculate the difference in mean fare paid by those who survived and those who died. Which group paid a higher average fare?
Calculate the difference in median fare for those who survived and those who died.
Create side-by-side box plots of fares by survival. Now that you can see the spread of the data, do the mean/median differences seem relatively small or large?
Create overlapping histograms of fares by survival (you'll have to delete or comment out your box plot code before you try to make a histogram). Does this provide any additional information?
The
titanic
dataset contains information about passengers on the Titanic, including the amount they paid for their fare and whether or not they survived (this is a subset of the full data available). Let's investigate whether there is an association between the fare that a passenger paid (Fare) and whether or not they survived (Survived, which is equal to 0 is the passenger died and 1 if they survived):