covars_make_all returns NAs for baselines

kmunger commented 6 years ago

When I run the function covars_make_all on hansard speeches, 29 of the 33 measures are returned correctly, but not the 4 measures related to word rarity.

However, when I run covars_make_baselines, these 4 measures work on the same corpus.

setwd("C:/Users/kevin/Dropbox/Benoit_Spirling_Readability/hansard_data/")
files<-list.files()

##initialize
all_files<-read.csv(paste0(files[2]), stringsAsFactors = F)
restricted<-filter(all_files, party == "Conservative" | party == "Labour")
speakers<-all_files$speaker
tab<-table(speakers)
speakers_morethan10 <- names(tab[tab > 10])
restricted <- filter(restricted, speaker %in% speakers_morethan10)

restricted<-restricted[which(ntoken(restricted$text)>10),]

data_corpus_speeches66 <- corpus(restricted)

pos<-covars_make_all(data_corpus_speeches66, dependency=F)`

> pos$google_min_2000[100]
[1] NA

> pos$brown_mean[1000]
[1] NA

kbenoit commented 6 years ago

@kmunger is this still a concern, or just an issue to fix (eventually) in the software?

kmunger commented 6 years ago

@kbenoit Not an immediate concern, there's an easy workaround, just something to fix at some point

kbenoit / sophistication

covars_make_all returns NAs for baselines #1