trinker / sentimentr

Dictionary based sentiment analysis that considers valence shifters
Add a parallel option #47

Open trinker opened 7 years ago

trinker commented 7 years ago

A parallel option that runs sentiment and sentiment_by on multiple cores

trinker commented 6 years ago

Dump everything out to temp rds and read back to the clusters...add a library arg

trinker commented 6 years ago

Initial attempts leads to error on Windows (parallel seems to be using an old version of R and throws an error with regard to Rcpp being the wrong version fixed this by using newer version of R on path but now an error related to sentimentr indicating still an old version???). Maybe need to remove all R from path??

if (!require("pacman")) install.packages("pacman")
pacman::p_load(sentimentr, parallel, textshape, dplyr)

chunk_size <- 1e5

dat <- combine_data() %>%
    {.[rep(seq_len(nrow(.)), 100),]} %>%
    sample_n(nrow(.)) %>%
    split_index({inds <- chunk_size * 1:round(nrow(.)/chunk_size, 0); inds[inds < nrow(.)]})

tic <- Sys.time()

cl <- makeCluster(mc <- getOption("cl.cores", detectCores() - 2))

clusterEvalQ(cl, {

parLapply(cl, dat, function(x){


    senti_dat <- sentimentr::get_sentences(x)
    senti_dat <- sentimentr::sentiment_by(senti_dat)

    outfile <- sprintf('data/file_%s.rds', sample(1:100000))
    saveRDS(senti_dat, outfile)

}) %>%


Sys.time() - tic

Results in:

Error in checkForRemoteErrors(val) : 
  6 nodes produced errors; first error: 'get_sentences' is not an exported object from 'namespace:sentimentr'
trinker commented 6 years ago

Is either of the following a better way to run parallel code:

A OS independent solution is needed. Re investigate available solutions and reach out to the R community for current best practices.

trinker commented 6 years ago

Here's where I ask the R community:

bkmgit commented 3 years ago

