We use a chroma collection (currently named test, should be named schema) to embed the ontology/schema of the knowledge graph. In many cases, there may be multiple layers of schema, or taxonomies / picklists which are potentially very large.
Storing all those layers in the same collection poses a problem, as large picklists / schemas will be over-represented, making it impossible to fetch terms from the smaller layers.
Langchain has an ensemble retriever and a merger retriever specifically to address this issue: It allows us to create multiple collections and fetch a predefined number of items from each collection based on a single query.
Objective: support multi-collecthion chroma via ensemble or merger retriever.
Requirements:
[ ] Update chroma_build flow to take multiple input files (?) and create 1 chroma collection per input file
[ ] Update chroma config to take a list of collection names, instead of a single one
optionally a weight / top k associated with each collection
[ ] Update generation functions to use langchain's ensemble/merger retriever
We use a chroma collection (currently named
test
, should be namedschema
) to embed the ontology/schema of the knowledge graph. In many cases, there may be multiple layers of schema, or taxonomies / picklists which are potentially very large.Storing all those layers in the same collection poses a problem, as large picklists / schemas will be over-represented, making it impossible to fetch terms from the smaller layers.
Langchain has an ensemble retriever and a merger retriever specifically to address this issue: It allows us to create multiple collections and fetch a predefined number of items from each collection based on a single query.
Objective: support multi-collecthion chroma via ensemble or merger retriever.
Requirements: