lecy / foundations-of-data-science-for-the-public-sector

Lecture notes and labs for an introductory data science course for public policy and nonprofit management students
1 stars 0 forks source link

Simpler Code for Lab 04? #4

Open aehende1 opened 6 years ago

aehende1 commented 6 years ago

In Lab 04, on Question 6 and 7, I'm attempting to use the which.max() function to report the name of the Neighborhood with the greatest values specified in the question.

Currently, I am doing the following:

 dat.Vacant <- group_by( .data = dat.NhoodVal, Nhood )

count( dat.Vacant, VacantBuil = T)

vacancies <- count( dat.Vacant, VacantBuil = T)

which.max( vacancies$n )

I then use the which.max result and the results from the count() function to determine which Neighborhood is the answer. However, I'm wondering if there is a simpler way to use a nested function that could just print out the name of the neighborhood?

pkshen commented 6 years ago

Sorry I have a question. Where is the lab 04? I didn't see it on the blackboard.

lecy commented 6 years ago

You don't need to do it for the solution, but here is how I would go about it:

First, count vacant properties by neighborhood. Your solution works fine. You might think about some other recipes using dplyr - either filtering by vacant properties then counting by neighborhood, or grouping by neighborhood then identifying vacant properties.

dat %>% 
  filter( VacantBuil == "Y" ) %>%
  group_by( Nhood ) %>%
  summarize( count=n() ) %>% 
  arrange( desc( count ) )

dat %>% 
  group_by( Nhood ) %>%
  summarize( vacant = sum( VacantBuil == "Y", na.rm=T ) ) %>%
  arrange( desc( vacant ) ) 

dat %>% 
  count( Nhood, VacantBuil ) %>%
  filter( VacantBuil == "Y" ) %>%
  arrange( desc( n ) )

Once you have done these steps, you will have a tibble arranged by the neighborhood with the most vacant properties. You can select the neighborhood name then as:

vac.table <- dat %>% ... your solution
vac.table$Nhood[1] %>% as.character()

If the table was not already arranged, you could select using which.max() something like this:

vac.table$Nhood[ which.max( vac.table$vacant ) ]