openml-labs / ai_search

RAG pipeline and summary for openml
https://openml-labs.github.io/ai_search/
0 stars 0 forks source link

Get data that we can use to compute our evaluation metrics #6

Closed PGijsbers closed 1 day ago

PGijsbers commented 2 weeks ago

We need data that we can use to evaluate our models according to some evaluation metric (#5) during initial development.

This will most likely be some form of (query, relevant results) pairs. These should probably be fairly exhaustive, so for this we might also consider only working with a subset of all datasets. This has the added benefit of (hopefully) making our evaluations faster, too.

Another idea is to use LLMs to judge the relevancy of query results. But this has the danger of ignoring recall -- not realizing important documents were not retrieved.

PGijsbers commented 2 weeks ago

@LiinXemmon will make a script/app that will help us add labels to (query, dataset) pairs. For each query we need to be able to quickly label which datasets are relevant. The user could for example be prompted with a query, and then cycle through datasets and their description, indicating for each if they are relevant. Alternatively, you could take the other approach where a single dataset is presented and you cycle through each query to label the pairs. It's also fine if you think of an even better way to do this.

PGijsbers commented 1 week ago

People have been labeling the data with the tool (see tools). We are just waiting for the files to be shared.

PGijsbers commented 1 day ago

The merged and processed data is here: https://github.com/openml-labs/ai_search/blob/main/data/evaluation/query_key_dict.json

We should also add the individual files, should we decide to do something with them later.

PGijsbers commented 1 day ago

Added individual files here https://github.com/openml-labs/ai_search/tree/main/tools/data

except for subhaditya's file, since that one seemed broken. @SubhadityaMukherjee can you upload your file to that directory? Thanks. I'll close this issue.