Open justaddcoffee opened 1 year ago
Also Luca/Tommy have implemented an efficient cosine sim function in Ensmallen that we possibly could crib
Let's be careful to distinguish two things:
1 the similarity/distance metric 2 what the metric is operating over
On Fri, May 26, 2023 at 9:24 AM Justin Reese @.***> wrote:
Also Luca/Tommy have implemented an efficient cosine sim function https://github.com/AnacletoLAB/ensmallen/blob/master/graph/express_measures/src/cosine_similarity.rs that we possibly could crib
— Reply to this email directly, view it on GitHub https://github.com/monarch-initiative/semsimian/issues/56#issuecomment-1564635018, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMONHVZ37EU7DL5F5NHLXIDKLHANCNFSM6AAAAAAYQNS3CU . You are receiving this because you were mentioned.Message ID: @.***>
I think 2 would be handled entirely on the Python side by filling out the appropriate entries in a TermPairwiseSimilarity right? this issue is just to implement (hopefully efficiently) the calculation of cosine similarity
We'd like to have a function measure of cosine similarity between terms, e.g.
I'd suggest we let the caller bring their own embeddings in GRAPE (i.e. Pandas) format, then we can calculate cosine sim efficiently in Rust (possibly using Polars?)
Obviously we'd want to build cosine similarity into
all_by_all_similarity()
too eventuallyper discussion with @iQuxLE
cc @cmungall @julesjacobsen @matentzn