swirldev / swirl_courses

:mortar_board: A collection of interactive courses for the swirl R package.
http://swirlstats.com
Other
4.32k stars 7.24k forks source link

Swirl- R Programming Environment - Data Manipulation - Titanic Data #384

Closed provedup closed 6 years ago

provedup commented 6 years ago

Hi, I am doing an R course that uses Swirl. I am in chapter 12 of the R Programming Environment - Data Manipulation. I am stuck on the final problem about the Titanic Survivors. I began with code from the previous question, which creates the first data frame.

titanic_4 <- titanic %>% 
  select(Survived, Pclass, Age, Sex) %>%
  filter(!is.na(Age)) 
   mutate(agecat = cut(Age, breaks = c(0, 14.99, 50, 150), 
                  include.lowest = TRUE,
                  labels = c("Under 15", "15 to 50",
                             "Over 50")))
   head (titanic_4)

# After the previous question, you should have transformed the `titanic`
# data to look like this:
#
##   Survived Pclass   Age     Sex      agecat
##          0      3    22    male    15 to 50
##          1      1    38  female    15 to 50
##          1      3    26  female    15 to 50
##          1      1    35  female    15 to 50
##          0      3    35    male    15 to 50
##          0      1    54    male     Over 50
#
# Add one or more `dplyr` or `tidyr` functions to the pipe chain in 
# the code at the bottom of the script to change the `titanic` 
# dataset. The first six lines of the final `titanic_4` dataset 
# should look like the following example, with the number of
# passengers, number of survivors, and percent survival stratified
# by passenger class, age category, and sex. Be sure to use the 
# same column names as shown in the example output. 
#
## Pclass   agecat    Sex      N     survivors   perc_survived
## <int>   <fctr>    <chr>   <int>     <int>         <dbl>
##   1    Under 15  female     2         1        50.000000
##   1    Under 15    male     3         3       100.000000
##   1    15 to 50  female    70        68        97.142857
##   1    15 to 50    male    72        32        44.444444
##   1    Over 50   female    13        13       100.000000
##   1    Over 50     male    26         5        19.230769

To solve this problem, I created this code:

titanic_4 <- titanic %>% 
  select(Survived, Pclass, Age, Sex) %>%
  filter(!is.na(Age)) %>%
  mutate(agecat = cut(Age, breaks = c(0, 14.99, 50, 150), 

                       include.lowest = TRUE,
                          labels = c("Under 15", "15 to 50",
                                     "Over 50"))) %>%
      group_by(Pclass,agecat,Sex) %>%
      summarize(N=n(), survivors = sum(Survived))%>%
      mutate(perc_survived = sprintf("%.6f", 
      ((survivors/N)*100.000000)))

  head(titanic_4)

Which gives this output:

# A tibble: 6 x 6
# Groups:   Pclass, agecat [3]
  Pclass   agecat    Sex     N survivors perc_survived
   <int>   <fctr>  <chr> <int>     <int>         <chr>
1      1 Under 15 female     2         1     50.000000
2      1 Under 15   male     3         3    100.000000
3      1 15 to 50 female    70        68     97.142857
4      1 15 to 50   male    72        32     44.444444
5      1  Over 50 female    13        13    100.000000
6      1  Over 50   male    26         5     19.230769

The above output is wrong because the last column (perc_survived) is a character instead of a .

To solve this, I tell R to change the type to numeric with the as.numeric function.

  titanic_4 <- titanic %>% 
  select(Survived, Pclass, Age, Sex) %>%
  filter(!is.na(Age)) %>%
  mutate(agecat = cut(Age, breaks = c(0, 14.99, 50, 150), 
                      include.lowest = TRUE,
                      labels = c("Under 15", "15 to 50",
                                 "Over 50"))) %>%
  group_by(Pclass,agecat,Sex) %>%
  summarize(N=n(), survivors = sum(Survived))%>%
  mutate(perc_survived = sprintf("%.6f", (perc_survived = as.numeric 
  ((survivors/N)*100.000000))))  
   head(titanic_4)

which creates this output:

# A tibble: 6 x 6
# Groups:   Pclass, agecat [3]
  Pclass   agecat    Sex     N survivors perc_survived
   <int>   <fctr>  <chr> <int>     <int>         <dbl>
1      1 Under 15 female     2         1      50.00000
2      1 Under 15   male     3         3     100.00000
3      1 15 to 50 female    70        68      97.14286
4      1 15 to 50   male    72        32      44.44444
5      1  Over 50 female    13        13     100.00000
6      1  Over 50   male    26         5      19.23077

The new problem is that the output is rounded to 5 digits after the decimal place instead of 6 digits. I have tried every combination I can find, but have not been able to tell r to keep 6 decimals when it converts from character to numeric.

I'm stuck and need some guidance from a generous person. Thank you, Andrew

dsilvadeepal commented 6 years ago

Try adding ungroup() at the end

Ezra08 commented 4 years ago

Try adding ungroup() at the end

Where at the end do I put ungroup()

YT-er commented 4 years ago

Try adding ungroup() at the end

Where at the end do I put ungroup()

titanic_4 <- titanic %>% 
  select(Survived, Pclass, Age, Sex) %>%
  filter(!is.na(Age)) %>%
  mutate(agecat = cut(Age, breaks = c(0, 14.99, 50, 150), 
                      include.lowest = TRUE,
                      labels = c("Under 15", "15 to 50",
                                 "Over 50"))) %>%
  group_by(Pclass, agecat, Sex) %>%
  summarize(N = n(),
            survivors = sum(Survived == 1),
            perc_survived = 100 * survivors / N) %>% ungroup()
Ronak0796 commented 4 years ago

I also have an issue. when I submit my code I get an error.

titanic_4 <- titanic %>% select(Survived, Pclass, Age, Sex) %>% filter(!is.na(Age)) %>% mutate(agecat = cut(Age, breaks = c(0, 14.99, 50, 150), include.lowest = TRUE, labels = c("Under 15", "15 to 50", "Over 50"))) %>% group_by(Pclass, agecat, Sex) %>% summarize( N = n(), survivors = sum(Survived), perc_survived = survivors / N * 100 )

When I submit this code, I get this.... Sourcing your script...

Error in source(e$script_temp_path, encoding = "UTF-8") : C:\Users\Dell\AppData\Local\Temp\Rtmpwxfko6/step_4_titanic.R:43:0: unexpected end of input 41: labels = c("Under 15", "15 to 50", 42: "Over 50"))) %>%

can someone help me here