trinker / qdap

Quantitative Discourse Analysis Package: Bridging the gap between qualitative data and quantitative analysis
http://cran.us.r-project.org/web/packages/qdap/index.html
175 stars 44 forks source link

Sentence length restriction?! #245

Closed elesse closed 6 years ago

elesse commented 6 years ago

Hi guys, For the pupose of a text mining project, I am using qdap to process big text files (over 200 Mb). Unfortunately, I have noticed that the function "replace_contraction" stops at the beginning of the files without giving any error message. Could you tell me please how could I fix this issue? The files I am using can be downloaded from here: https://tinyurl.com/y8w7hhjf I pre-process the files as follows:

cluster <- makeCluster(4) registerDoParallel(cluster) start_time <- Sys.time() crps <- readtext("*.txt") #read text files object.size(crps) crps <- iconv(crps, "UTF-8", "ASCII", sub="") #exclude non-ASCII characters object.size(crps) save(crps, file = "crps.RData") corpus <- replace_contraction(crps) object.size(corpus) end_time <- Sys.time() time_corpus <- end_time - start_time stopCluster(cluster) registerDoSEQ()

Thanks a lot.

elesse commented 6 years ago

Hi! Actually, the problem was solved after reshaping the corpus. Cheers,