sashafrey / topicmod

This project had been moved to https://github.com/bigartm/bigartm
Other
0 stars 0 forks source link

Investigate why perplexity is NaN in the attached script #108

Closed sashafrey closed 10 years ago

sashafrey commented 10 years ago

The script can be downloaded from here: https://drive.google.com/folderview?id=0BywMvWOrZXR3M3V1aUNseUdnZGc&usp=gmail

sashafrey commented 10 years ago

Fixed by https://github.com/sashafrey/topicmod/commit/2b1cd2062a4c30b4f9a87298d5ac394b576d912f. The problem was because in the input docword file there were some tokens with 0 occurrences. BigARTM is now fixed to handle this in a robust way (ignore tokens with 0 occurrencies during parsing; for old batches, ignore tokens in Processor and in Perplexity calculation).