spion / triplie-ng

Chatbot with Markov chains BFS and Hebbian learning
MIT License
57 stars 16 forks source link

assoc cooccurrences does not always work (and results in undefined behavior and really bad replies) #32

Open Kermalis opened 3 years ago

Kermalis commented 3 years ago

The cooccurrences() function does a query on the database. However, the input words' ids are not always associated with each other (I'm not sure if they should always be associated)

So if the words are not an assoc pair, then the method returns { cooccurrences: null, modified: null } (it might be undefined, I forget, but the point still stands) This is a major problem for generating responses. It does look like you were aware it could be unset: https://github.com/spion/triplie-ng/blob/master/lib/pipeline/associate.js#L90 However this isn't checked everywhere else later, so I'm unsure. For example, right after the cooccurrences are queried, modified is being used without being checked: https://github.com/spion/triplie-ng/blob/master/lib/pipeline/associate.js#L113 This results in very bad decay scores (although, that's probably how you would want it for unrelated pairs. In my opinion they should be removed entirely)

Later when picking replies, cooccurrences() is called again, and this time neither property is checked to be valid. Also, the method only returns those two properties, but here you are checking for .oid and .id which do not exist in this object (ever): https://github.com/spion/triplie-ng/blob/master/lib/pipeline/associate.js#L113 So oval will always be 1, and the original dictionary is completely unused.

Unrelated to this issue, you are calculating the days age and not using it: https://github.com/spion/triplie-ng/blob/master/lib/pipeline/associate.js#L160 It is calculated again in decay() so some cpu is being wasted doing it twice when one won't be used.

Back on topic, the decay() function is using possibly unset dates from the broken return: https://github.com/spion/triplie-ng/blob/master/lib/pipeline/associate.js#L160

So each of these replies gets a score of 0 or -0.

This would be okay, if it were not for the situation where EVERY reply gets a score of 0. No replies are being given scores unless the word is extremely recent (according to the associations halflife value), which means these faulty answers are possibly being used.