jxnl / n-levels-of-rag

MIT License
176 stars 10 forks source link

Keyword Matching #12

Closed ivanleomk closed 6 months ago

ivanleomk commented 6 months ago

This PR introduces a new flag in the evaluate command called eval-mode. This helps to support keyword matching where we try to match rows that have a text content which has at least 3 rows.

You can run the text search by doing

rag-app evaluate from-jsonl --input-file-path ./output.jsonl --db-path ./db --table-name pg --eval-mode fts

When we generate bad keywords using a llm, then we get much fewer rows ( Less than 25 ) since our query only returns rows that have at least one match. This results in some queries returning 0 or 1 rows which results in a 'n/a' or 0/1 value for the ndcg respectively.

I'm not sure if this is the behaviour that we might want so will explore other queries.


Ellipsis :rocket: This PR description was created by Ellipsis for commit 184dcaf9c9f424c195672d97aa51ca0cbd156029.

Summary:

This PR introduces a new fts mode to the evaluate command in rag_app for keyword matching, updates the output.jsonl file, and modifies the calculate_ndcg function in rag_app/src/metrics.py to handle cases with 0 or 1 predictions.

Key points:


Generated with :heart: by ellipsis.dev

ivanleomk commented 6 months ago

Seems like the python lib also supports FTS using an index through tantivity (https://lancedb.github.io/lancedb/fts/#index-multiple-columns )

Will look into this

ivanleomk commented 6 months ago

Current FTS results

                                  MRR@3  MRR@5  MRR@10  MRR@20 NDCG@3 NDCG@5 NDCG@10 NDCG@20  retrieved_size
chunk_id                                                                                                    
09369eb77f4c743034d01d13744faf6d    1.0    1.0     1.0    1.00    1.0    1.0     1.0     1.0               7
b6bb3ecf14a22bc3144b3c8bb101a1e8    1.0    1.0     1.0    1.00    1.0    1.0     1.0     1.0              25
65d0dc53f68d4e5a9b95e6268673cc09    1.0    1.0     1.0    1.00    1.0    1.0     1.0     1.0              25
f8270d09dace076cb5dbd79309d08fa4    0.0    0.0     0.0    0.05    0.0    0.0     0.0    0.23              25
de494946e19340c348b991e5845cd8c4    1.0    1.0     1.0    1.00    1.0    1.0     1.0     1.0               4
afc4b95c2537f788fe5711c9835b58bf    0.0    0.0     0.0    0.00      0      0       0       0               1
66288d8fe7e8f36b7f4c2bf4d5af7b18    1.0    1.0     1.0    1.00    1.0    1.0     1.0     1.0               9
3bb3e61a043cd396acc7669d021ab532    0.0    0.0     0.0    0.00    N/A    N/A     N/A     N/A               0
ddd2c319f6b2bae1f9583b497bc615e4    0.0    0.0     0.0    0.00      0      0       0       0               1
965e5abc2b1409f099518c51d13d7a5a    0.0    0.0     0.0    0.00      0      0       0       0               1

              Mean Values              
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓
┃ Metric         ┃ Value              ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━┩
│ MRR@3          │ 0.5                │
│ MRR@5          │ 0.5                │
│ MRR@10         │ 0.5                │
│ MRR@20         │ 0.505              │
│ NDCG@3         │ 0.5555555555555556 │
│ NDCG@5         │ 0.5555555555555556 │
│ NDCG@10        │ 0.5555555555555556 │
│ NDCG@20        │ 0.5811111111111111 │
│ retrieved_size │ 9.8                │

This is the respective semantic search result

                                  MRR@3  MRR@5  MRR@10  MRR@20  NDCG@3  NDCG@5  NDCG@10  NDCG@20  retrieved_size
chunk_id                                                                                                        
09369eb77f4c743034d01d13744faf6d    1.0    1.0     1.0     1.0    1.00    1.00     1.00     1.00              25
b6bb3ecf14a22bc3144b3c8bb101a1e8    1.0    1.0     1.0     1.0    1.00    1.00     1.00     1.00              25
65d0dc53f68d4e5a9b95e6268673cc09    1.0    1.0     1.0     1.0    1.00    1.00     1.00     1.00              25
f8270d09dace076cb5dbd79309d08fa4    1.0    1.0     1.0     1.0    1.00    1.00     1.00     1.00              25
de494946e19340c348b991e5845cd8c4    1.0    1.0     1.0     1.0    1.00    1.00     1.00     1.00              25
afc4b95c2537f788fe5711c9835b58bf    0.5    0.5     0.5     0.5    0.63    0.63     0.63     0.63              25
66288d8fe7e8f36b7f4c2bf4d5af7b18    1.0    1.0     1.0     1.0    1.00    1.00     1.00     1.00              25
3bb3e61a043cd396acc7669d021ab532    1.0    1.0     1.0     1.0    1.00    1.00     1.00     1.00              25
ddd2c319f6b2bae1f9583b497bc615e4    0.0    0.2     0.2     0.2    0.00    0.39     0.39     0.39              25
965e5abc2b1409f099518c51d13d7a5a    0.5    0.5     0.5     0.5    0.63    0.63     0.63     0.63              25

       Mean Values        
┏━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃ Metric         ┃ Value ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━┩
│ MRR@3          │ 0.8   │
│ MRR@5          │ 0.82  │
│ MRR@10         │ 0.82  │
│ MRR@20         │ 0.82  │
│ NDCG@3         │ 0.826 │
│ NDCG@5         │ 0.865 │
│ NDCG@10        │ 0.865 │
│ NDCG@20        │ 0.865 │
│ retrieved_size │ 25.0  │
ivanleomk commented 6 months ago

Implemented the Bm25 search! It's a bit slower than embedding search but performs quite well tbh

       Mean Values        
┏━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃ Metric         ┃ Value ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━┩
│ MRR@3          │ 0.8   │
│ MRR@5          │ 0.81  │
│ MRR@10         │ 0.81  │
│ MRR@20         │ 0.82  │
│ NDCG@3         │ 0.83  │
│ NDCG@5         │ 0.85  │
│ NDCG@10        │ 0.85  │
│ NDCG@20        │ 0.86  │
│ retrieved_size │ 25.0  │