openml-labs / ai_search

RAG pipeline and summary for openml
https://openml-labs.github.io/ai_search/
0 stars 0 forks source link

Experiment with different ways to represent datasets #30

Open PGijsbers opened 2 months ago

PGijsbers commented 2 months ago

To embed datasets and store them in our vector database, we are first converting them to documents (read: strings). This can be done in many different ways, and we have a lot of metadata of the dataset (title, description, data itself, qualities, features, ...). How to best "textify" the dataset and its metadata in a string to improve its discovery during the semantic seach is an open question.