I am using tokenize and it appears to be removing none of the things it should. I am using the current github version of quanteda. The resulting table still has all the numbers, hyphens, etc. I am not sure what I am doing wrong. Thanks for any insights. I didnt inlcude the file nor the code for bad words but dfm didnt remove them either.
I am using tokenize and it appears to be removing none of the things it should. I am using the current github version of quanteda. The resulting table still has all the numbers, hyphens, etc. I am not sure what I am doing wrong. Thanks for any insights. I didnt inlcude the file nor the code for bad words but dfm didnt remove them either.
Data: