There are 10 similarity scores for each term – 1M for Idun and 1,5M for Ugglan. We can get them with this query:
SELECT
t1.term_term AS term1,
t2.term_term AS term2,
similarity
FROM termsim
JOIN term AS t1 ON t1.term_id = termsim.term1_id
JOIN term AS t2 ON t2.term_id = termsim.term2_id;
Adding the restriction WHERE t1.term_term = 'aachen' OR t2.term_term = 'aachen', we get the illustrative excerpt below. It highlights that A is not necessarily one of the 10 most similar terms to B, even if B is one of the 10 most similar terms to A. For instance, fästningsfyrkanten is a very unusual word, and is relatively unlikely to be in the Top 10 for any other term.
In fact, it probably makes more sense to publish the word embeddings, from which the similarities are calculated. Embeddings are created by the word2vec.py script.
There are 10 similarity scores for each term – 1M for Idun and 1,5M for Ugglan. We can get them with this query:
Adding the restriction
WHERE t1.term_term = 'aachen' OR t2.term_term = 'aachen'
, we get the illustrative excerpt below. It highlights that A is not necessarily one of the 10 most similar terms to B, even if B is one of the 10 most similar terms to A. For instance, fästningsfyrkanten is a very unusual word, and is relatively unlikely to be in the Top 10 for any other term.