vmware / versatile-data-kit

One framework to develop, deploy and operate data workflows with Python and SQL.
Apache License 2.0
426 stars 56 forks source link

vdk-postgres: support writing vectors to a postgres instance with pgvector installed using the VDK #2994

Open murphp15 opened 9 months ago

murphp15 commented 9 months ago

What is the feature request? What problem does it solve?

The vdk sdk contains functions like send_tablular_data_for_ingestion. We need to suppose sending vectorized data for ingestion. This most likely won't require changing any function declarations. But it will require making changes to the postgres plugin to support writing vectors.

Definition of done

  1. Postres plugin supports writing vectors using pgvector
  2. Functional tests using example data job
antoniivanov commented 8 months ago

if we represent vector column as string as in "[1,2,3]" pgvector would handle it automatically so maybe no change is needed. The story might remian for test be added

Currently you can do something like that:

data = dict(id="1", chunk="text", embedding="[1,2,3,4]")
job_input.send_object_for_ingestion(data=data, method="postgres")

and as long as embedding column is of type vector and pgvector is installed it should work