lecy / foundations-of-data-science-for-the-public-sector

Lecture notes and labs for an introductory data science course for public policy and nonprofit management students
1 stars 0 forks source link

Error Message in Selector Vector #3

Open aehende1 opened 6 years ago

aehende1 commented 6 years ago

In Lab02, on question 2, I am attempting to create a subset of the data called tax.exempt that only includes tax exempt properties. The code I have been using is:
> tax.exempt <- dat$LandUse == c("Vacant Land","Parking","Schools","Parks","Cemetery","Religious","Recreation","Community Services","Utilities")

I have been receiving the same error message repeatedly:

Warning messages: 1: In is.na(e1) | is.na(e2) : longer object length is not a multiple of shorter object length 2: In ==.default(dat$LandUse, c("Vacant Land", "Parking", "Schools", : longer object length is not a multiple of shorter object length

I did not receive an error message with the following code:

> tax.exempt <- dat$LandUse == c("Vacant Land","Parking","Schools","Parks","Cemetery","Religious")

This leads me to believe the issue is with the number of categories I am attempting to include. Anyone have any insights or advice?

lecy commented 6 years ago

You are stumbling upon an important nuance, and a useful shortcut operator.

Recall that when you are writing logical statements you need to use logical operators. Since you are creating a compound statement - your group is comprised of elements that can meet multiple criteria - you need to decide whether to use the AND or OR operator.

Since your case is inclusive, the correct syntax is the OR operator:

tax.exempt <- dat$LandUse == "Vacant Land" | dat$LandUse == "Parking" | dat$LandUse == "Schools"

When these are all different groups from the same categorical variable, there is a shortcut function that allows you to not have to use the OR statement and repeat dat$LandUse over and over. It's the %in% operator:

tax.exempt <- dat$LandUse %in% c("Vacant Land","Parking","Schools")

You were very close! The only difference is using %in% instead of == in the compound case.

I'm not sure why you were getting those warnings, though? Be sure the categories are spelled correctly. For example, if your data is read in so that "Parking" includes a space, you would need to type it as "Parking " to match that case. You can check these by asking for allowable categories:

levels( dat$LandUse )