koheiw / newsmap

Semi-supervised algorithm for geographical document classification
Other
58 stars 22 forks source link

predict() returns "Error in x[, feature] : Subscript out of bounds" #39

Closed SCommain closed 4 years ago

SCommain commented 4 years ago

I run into an error with predict() when trying to predict which topic is most strongly associated with each sentence in my data: R throws the following error:

"Error in x[, feature] : Subscript out of bounds"

I don't see where the error could come from... I attach the code I use (for applying the dictionnary first, and then for fitting the newsmap model), together with the dictionary and the tokens object. Thanks in advance for your help! Files.zip

koheiw commented 4 years ago

@SCommain , Thanks for sending the file. It is a bug in quanteda that we will fix soon. For the moment, you can avoid the problem by dfm(toksSent, remove = "").

Unrelated to the issue, I also noticed that your regex dictionary is too ambiguous. For example "of" in "72_5 of rwas" matches "off", "coffee" etc if you do not specify beginning and ending with "^of$".

koheiw commented 4 years ago

Please use the development version of quanteda.