sashafrey / topicmod

This project had been moved to https://github.com/bigartm/bigartm
Other
0 stars 0 forks source link

Dynamically add tokens to model when new data arrives #14

Closed sashafrey closed 10 years ago

sashafrey commented 10 years ago

Currently merger resets the whole model whenever it see new generation. There are many better ways to do this.

  1. Scan the whole generation, and add only those tokens that were added.
  2. Scan only new partitions (those that are different in old and new generation).
sashafrey commented 10 years ago

Fixed in https://github.com/sashafrey/topicmod/tree/alfrey_auto_discover_tokens

New behaviour: If processor observes a token that is not part of token-token matrix, it stores this token in the list of new ``discovered'' tokens, and transfers this list as part of processor output. Merger picks up all such tokens, and initializes new row in token-topic matrix. So, during the first scan over the collection the dictionary is gathered automatically.