jkomoros / flux-bot

Discord bot for GALE-x
3 stars 1 forks source link

Use tfidf of OPs to generate auto thread names #10

Open jkomoros opened 3 years ago

jkomoros commented 3 years ago

Ideally we'd use as the corpus the actual body of messages in the guild. This will get hard to generate, because we'll have to fetch all messages from each channel to regenerate it, which is expensive (fetch the most recent message, then fetch the 100 messages before that, then keep on repeating until there are no messages left, per channel).

We could store the TFIDF index as a .json file on the VM, and only regenerate it when necessary. Whenever the bot is offline, it might miss some messages, so you'd want to regenerate it to be safe. Another option would be to have maybe a firestore style DB, and keep track of, for each channel, the most recent imported message; when the bot boots it can see where there appear to be new messages and fetch the new ones. You'd need to still potentially regenerate the index sometimes, for example if we changed how we stem words or need some other metadata, but that should be reasonably rare.

jkomoros commented 3 years ago

Another option is to bootstrap with an IDF calculated from some other corpus, like perhaps the compendium corpus