Closed shashankg7 closed 4 years ago
I’m not positive, but I seem to remember that the pre-processing regex removes all digits. You probably can edit the regex though.
Yes, use --token-regex '\d+'
. You're probably getting NaN
because there are zero recognized tokens, and it's trying to divide by that number.
Thank you for your response.
I added [0-9]* in the regex. That is equivalent to this, right?
Also, I am trying to index around 300k documents using this command, but it's been ~24hours since its running, still not completed.
Any suggestions around this?
Hi,
I am trying to run topic modelling on a user-modelling task. I have a list of activities of users and I am treating each activity (on item) as word. The items are encoded as integers.
I am trying to run LDA on this data. I tried a similar approach to running topic modelling on a directory with multiple text files, but LDA is throwing NaN when running it.
is the tool limited to textual input? Any suggestions would be appreciated.