gdscrapeR is an R package that scrapes company reviews from Glassdoor using a single function: get_reviews
. It returns a data frame structure for holding the text data, which can be further prepped for text analytics learning projects.
The latest version from GitHub:
install.packages("devtools")
devtools::install_github("mguideng/gdscrapeR")
library(gdscrapeR)
The URL to scrape the awesome SpaceX company will be: www.glassdoor.com/Reviews/SpaceX-Reviews-E40371.htm.
Pass the company number through the get_reviews
function. The company number is a string representing a company's unique ID number. Identified by navigating to a company's Glassdoor reviews web page and reviewing the URL for characters between "Reviews-" and ".htm" (usually starts with an "E" and followed by digits).
# Create data frame of: Date, Summary, Rating, Title, Pros, Cons, Helpful
df <- get_reviews(companyNum = "E40371")
This will scrape the following variables:
Use regular expressions to clean and extract additional variables and then export:
#### REGEX ####
# Package
library(stringr) # pattern matching functions
# Add: PriKey (uniquely identify rows 1 to N, sorted from first to last review by date)
df$rev.pk <- as.numeric(rownames(df))
# Extract: Year, Status, Position, Location
df$rev.year <- as.numeric(sub(".*, ","", df$rev.date))
df$rev.stat <- str_match(df$rev.title, ".+?(?= Employee -)")
df$rev.pos <- str_replace_all(df$rev.title, ".* Employee - |\\sin .*|\\s$", "")
df$rev.loc <- sub(".*\\sin ", "", df$rev.title)
df$rev.loc <- ifelse(df$rev.loc %in%
(grep("Former Employee|Current Employee|^+$", df$rev.loc, value = T)),
"Not Given", df$rev.loc)
# Clean: Pros, Cons, Helpful
df$rev.pros <- gsub("&", "&", df$rev.pros)
df$rev.cons <- gsub("&", "&", df$rev.cons)
df$rev.helpf <- as.numeric(gsub("\\D", "", df$rev.helpf))
#### EXPORT ####
write.csv(df, "df-results.csv", row.names = F)
gdscrapeR
was made for learning purposes. Analyze the unstructured text, extract relevant information, and transform it into useful insights.
If you find this package useful, feel free to star :star: it. Thanks for visiting :heart: .
rvest
and purrr
packages to make it easy to scrape company reviews into a data frame.rvest
and purrr
work. For more on this, see the "Known limitations" section of the demo write-up: "Scrape Glassdoor Company Reviews in R Using the gdscraper Package".[imlearningthethings at gmail]
.