ropensci-review-tools / pkgmatch

Find R packages matching either descriptions or other R packages
http://docs.ropensci.org/pkgmatch/
Other
2 stars 1 forks source link

Similarity Metric #8

Closed mpadge closed 2 months ago

mpadge commented 2 months ago

Need to document thoroughly, including refs to model tech reports + link to Netflix article about issues with cosine similarities.

https://www.anthropic.com/news/contextual-retrieval

https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf

Use embedding cosine similarity plus BM25 #9 and combine with RRF from above ref to generate final score/ranking

mpadge commented 2 months ago

Re-opening because the last commit introduces weights into the re-ranking function, so that scores from data including function definitions are weighted less than data excluding these. That seems to give generally better results. I'll now add code that simply greps for "function" in any text query, and passes a fn_defs flag to the reranking function to activate that weighting or not.