Closed mpadge closed 2 months ago
Re-opening because the last commit introduces weights into the re-ranking function, so that scores from data including function definitions are weighted less than data excluding these. That seems to give generally better results. I'll now add code that simply greps for "function" in any text query, and passes a fn_defs
flag to the reranking function to activate that weighting or not.
Need to document thoroughly, including refs to model tech reports + link to Netflix article about issues with cosine similarities.
https://www.anthropic.com/news/contextual-retrieval
https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf
Use embedding cosine similarity plus BM25 #9 and combine with RRF from above ref to generate final score/ranking