weaviate / weaviate-python-client

A python native client for easy interaction with a Weaviate instance.
https://weaviate.io/developers/weaviate/current/client-libraries/python.html
BSD 3-Clause "New" or "Revised" License
157 stars 68 forks source link

[DX] Create objects from other objects #1177

Open CShorten opened 1 month ago

CShorten commented 1 month ago

What

Say we have a List[str] property in one Weaviate collection such as chunks, or a JSON property, we then want an API to populate another collection with each string value, potentially inheriting other properties of the collection as well.

Why

We believe one of the killer use cases of GFLs is for an LLM to chunk long documents such as PDFs into chunks and metadata descriptions, thus we have a JSON property that stores the list of chunks and metadata strings per entry. It would be great to have an API that flows this from say "WeaviateBlogPosts" --> "WeaviateBlogChunks"

How

weaviate_blog_posts.data.transfer(
  to_collection="WeaviateBlogChunks",
  split_properties="ChunkAndMetadataJSON",
  inherit_properties=["title", "author", "date_published"],
  add_cref=true,
  uuids=uuids
)

Assuming ChunkAndMetadataJSON is populated with a GFL such as:

weaviate_blog_posts.data.gfl.update(
  instruction="Please break up this markdown file into semantic chunks with metadata further description their context in the original document",
  view_properties=["content"],
  on_property=["ChunkAndMetadataJSON"],
  uuids=uuids
)

^ We still need to figure out how we can interface composite types like this to the GFL. So alternately this could be a List[ChunkWithMetadata] type.

tsmith023 commented 1 month ago

This was a use-case brought to me by @jfrancoa also since it is a frequent journey to be able to migrate collections either within or between instances. Developing this during the next sprint would be a good idea, I think!

CShorten commented 1 month ago

Awesome!! Super happy to hear it, thanks @tsmith023!