To embed datasets and store them in our vector database, we are first converting them to documents (read: strings). This can be done in many different ways, and we have a lot of metadata of the dataset (title, description, data itself, qualities, features, ...). How to best "textify" the dataset and its metadata in a string to improve its discovery during the semantic seach is an open question.
To embed datasets and store them in our vector database, we are first converting them to documents (read: strings). This can be done in many different ways, and we have a lot of metadata of the dataset (title, description, data itself, qualities, features, ...). How to best "textify" the dataset and its metadata in a string to improve its discovery during the semantic seach is an open question.