Closed ivanleomk closed 6 months ago
Seems like the python lib also supports FTS using an index through tantivity (https://lancedb.github.io/lancedb/fts/#index-multiple-columns )
Will look into this
Current FTS results
MRR@3 MRR@5 MRR@10 MRR@20 NDCG@3 NDCG@5 NDCG@10 NDCG@20 retrieved_size
chunk_id
09369eb77f4c743034d01d13744faf6d 1.0 1.0 1.0 1.00 1.0 1.0 1.0 1.0 7
b6bb3ecf14a22bc3144b3c8bb101a1e8 1.0 1.0 1.0 1.00 1.0 1.0 1.0 1.0 25
65d0dc53f68d4e5a9b95e6268673cc09 1.0 1.0 1.0 1.00 1.0 1.0 1.0 1.0 25
f8270d09dace076cb5dbd79309d08fa4 0.0 0.0 0.0 0.05 0.0 0.0 0.0 0.23 25
de494946e19340c348b991e5845cd8c4 1.0 1.0 1.0 1.00 1.0 1.0 1.0 1.0 4
afc4b95c2537f788fe5711c9835b58bf 0.0 0.0 0.0 0.00 0 0 0 0 1
66288d8fe7e8f36b7f4c2bf4d5af7b18 1.0 1.0 1.0 1.00 1.0 1.0 1.0 1.0 9
3bb3e61a043cd396acc7669d021ab532 0.0 0.0 0.0 0.00 N/A N/A N/A N/A 0
ddd2c319f6b2bae1f9583b497bc615e4 0.0 0.0 0.0 0.00 0 0 0 0 1
965e5abc2b1409f099518c51d13d7a5a 0.0 0.0 0.0 0.00 0 0 0 0 1
Mean Values
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓
┃ Metric ┃ Value ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━┩
│ MRR@3 │ 0.5 │
│ MRR@5 │ 0.5 │
│ MRR@10 │ 0.5 │
│ MRR@20 │ 0.505 │
│ NDCG@3 │ 0.5555555555555556 │
│ NDCG@5 │ 0.5555555555555556 │
│ NDCG@10 │ 0.5555555555555556 │
│ NDCG@20 │ 0.5811111111111111 │
│ retrieved_size │ 9.8 │
This is the respective semantic search result
MRR@3 MRR@5 MRR@10 MRR@20 NDCG@3 NDCG@5 NDCG@10 NDCG@20 retrieved_size
chunk_id
09369eb77f4c743034d01d13744faf6d 1.0 1.0 1.0 1.0 1.00 1.00 1.00 1.00 25
b6bb3ecf14a22bc3144b3c8bb101a1e8 1.0 1.0 1.0 1.0 1.00 1.00 1.00 1.00 25
65d0dc53f68d4e5a9b95e6268673cc09 1.0 1.0 1.0 1.0 1.00 1.00 1.00 1.00 25
f8270d09dace076cb5dbd79309d08fa4 1.0 1.0 1.0 1.0 1.00 1.00 1.00 1.00 25
de494946e19340c348b991e5845cd8c4 1.0 1.0 1.0 1.0 1.00 1.00 1.00 1.00 25
afc4b95c2537f788fe5711c9835b58bf 0.5 0.5 0.5 0.5 0.63 0.63 0.63 0.63 25
66288d8fe7e8f36b7f4c2bf4d5af7b18 1.0 1.0 1.0 1.0 1.00 1.00 1.00 1.00 25
3bb3e61a043cd396acc7669d021ab532 1.0 1.0 1.0 1.0 1.00 1.00 1.00 1.00 25
ddd2c319f6b2bae1f9583b497bc615e4 0.0 0.2 0.2 0.2 0.00 0.39 0.39 0.39 25
965e5abc2b1409f099518c51d13d7a5a 0.5 0.5 0.5 0.5 0.63 0.63 0.63 0.63 25
Mean Values
┏━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃ Metric ┃ Value ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━┩
│ MRR@3 │ 0.8 │
│ MRR@5 │ 0.82 │
│ MRR@10 │ 0.82 │
│ MRR@20 │ 0.82 │
│ NDCG@3 │ 0.826 │
│ NDCG@5 │ 0.865 │
│ NDCG@10 │ 0.865 │
│ NDCG@20 │ 0.865 │
│ retrieved_size │ 25.0 │
Implemented the Bm25 search! It's a bit slower than embedding search but performs quite well tbh
Mean Values
┏━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃ Metric ┃ Value ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━┩
│ MRR@3 │ 0.8 │
│ MRR@5 │ 0.81 │
│ MRR@10 │ 0.81 │
│ MRR@20 │ 0.82 │
│ NDCG@3 │ 0.83 │
│ NDCG@5 │ 0.85 │
│ NDCG@10 │ 0.85 │
│ NDCG@20 │ 0.86 │
│ retrieved_size │ 25.0 │
This PR introduces a new flag in the
evaluate
command calledeval-mode
. This helps to support keyword matching where we try to match rows that have a text content which has at least 3 rows.You can run the text search by doing
When we generate bad keywords using a llm, then we get much fewer rows ( Less than 25 ) since our query only returns rows that have at least one match. This results in some queries returning 0 or 1 rows which results in a 'n/a' or 0/1 value for the
ndcg
respectively.I'm not sure if this is the behaviour that we might want so will explore other queries.
Summary:
This PR introduces a new
fts
mode to theevaluate
command inrag_app
for keyword matching, updates theoutput.jsonl
file, and modifies thecalculate_ndcg
function inrag_app/src/metrics.py
to handle cases with 0 or 1 predictions.Key points:
eval-mode
in theevaluate
command ofrag_app
to support keyword matching.fts
(Full Text Search) mode that generates keywords for questions and matches chunks with these keywords.output.jsonl
file.calculate_ndcg
function inrag_app/src/metrics.py
to handle cases with 0 or 1 predictions.Generated with :heart: by ellipsis.dev