Sentence embedding similarity is sufficient to query for relevant information where the same information will likely be included in the top-k items, but is not sufficient to check whether a particular item is indeed the same information. As Hugo pointed out, the issue is especially present when there are asymmetric relations in the sentence, sentence embeddings as-of-now do not properly capture that.
Essentially, even when the cosine similarity is close to 1 it is still not a reliable indicator for the information being the same, there will be other pairs which have higher scores and yet are completely different, here examples to illustrate:
Things which convey different information having high score:
emb1 = get_embedding("the cat hates the dog")
emb2 = get_embedding("the dog hates the cat")
print(cosine_similarity(emb1, emb2))
0.98
emb1 = get_embedding("the cat is lying on the table")
emb2 = get_embedding("the cat is lying under the table")
print(cosine_similarity(emb1, emb2))
0.97
Things which convey the same information, yet having lower score than the above:
emb1 = get_embedding("the duck is floating in the pond")
emb2 = get_embedding("the duck is swimming in the pond")
print(cosine_similarity(emb1, emb2))
0.96
emb1 = get_embedding("the person is drinking the beer")
emb2 = get_embedding("the person is taking a sip from the beer")
0.96
For NARS it is only safe to merge evidence when it is about the same relation, which is addressed by triggering revision only when the symbolic encodings match. In this project this works even with natural sentence input, since GPT is prompted to extract atomic pieces of information (as simple sentences involving a subject, relation, and predicate) NARS can effectively work with and reason on and become part of GPT's prompt for question-answering together with by NARS derived pieces of information.
But in case better embeddings which can reliably capture relational semantics will become available, the following can be tried in addition:
gptONA: when Relation/Property claims are made, check existing embeddings and use the existing relational representation if the similarity is above a threshold, potentially penalizing confidence by the difference (by simply multiplying with the embedding similarity).
NarsGPT: leave relational encoding as-is, but apply revision&choice with items within a threshold in terms of sentence embedding similarity, also penalizing the evidence based on the difference (again by multiplication with the similarity score).
Sentence embedding similarity is sufficient to query for relevant information where the same information will likely be included in the top-k items, but is not sufficient to check whether a particular item is indeed the same information. As Hugo pointed out, the issue is especially present when there are asymmetric relations in the sentence, sentence embeddings as-of-now do not properly capture that.
Essentially, even when the cosine similarity is close to 1 it is still not a reliable indicator for the information being the same, there will be other pairs which have higher scores and yet are completely different, here examples to illustrate:
Things which convey different information having high score:
Things which convey the same information, yet having lower score than the above:
For NARS it is only safe to merge evidence when it is about the same relation, which is addressed by triggering revision only when the symbolic encodings match. In this project this works even with natural sentence input, since GPT is prompted to extract atomic pieces of information (as simple sentences involving a subject, relation, and predicate) NARS can effectively work with and reason on and become part of GPT's prompt for question-answering together with by NARS derived pieces of information.
But in case better embeddings which can reliably capture relational semantics will become available, the following can be tried in addition: