Experiment with different ways to represent datasets

To embed datasets and store them in our vector database, we are first converting them to documents (read: strings). This can be done in many different ways, and we have a lot of metadata of the dataset (title, description, data itself, qualities, features, ...). How to best "textify" the dataset and its metadata in a string to improve its discovery during the semantic seach is an open question.

openml-labs / ai_search

Experiment with different ways to represent datasets #30