hblackburn / R4Econ

Shared resources for Econ Research Assistants working in R
MIT License
44 stars 16 forks source link

How to select/drop/keep variables whose names have the same string patterns? #11

Open sunnyxinyiliu opened 5 years ago

sunnyxinyiliu commented 5 years ago

Let the dataframe in use be called "test.dat"

Say we want to keep only variables called "quantity_1", "quantity_2",...."quantity100" and these are the only vars with "quantity" in their names in the data. In stata we could just do "keep quantity*", for example. But in R it's a bit cumbersome to retype "quantity" 100 times.

Instead, we can do: final.test.dat<-test.dat %>% select(matches("quantity")) #this keeps only the vars whose names contain "quantity"

To drop these vars instead of keeping them, add a minus sign in front of "matches": final.test.dat<-test.dat %>% select(-matches("quantity_"))

To drop vars with several name patterns (say, I want to drop ones with "quantity" and "value" in names), use the "|" separator.

For example: droplist="rationcard_no|name|location_id|secc|panchayat|village|address|tolla|qtr_plot|holding|status|mobile|printFlag|application|activity|Flag|created_by|printed_by|rationcardno"

final.test.dat <-test.dat %>% select(-matches(droplist))

NickCH-K commented 5 years ago

The | trick is cool - I didn't know that one!

In general for this, looking up help(select_helpers) in the tidyselect package (or in the tidyverse package) is a good place to start. The "quantity_" example might be better done with starts_with rather than matches so you don't get unexpected matches.

hblackburn commented 5 years ago

Awesome! Thanks Sunny!