allow multiple chroma collections

We use a chroma collection (currently named test, should be named schema) to embed the ontology/schema of the knowledge graph. In many cases, there may be multiple layers of schema, or taxonomies / picklists which are potentially very large.

Storing all those layers in the same collection poses a problem, as large picklists / schemas will be over-represented, making it impossible to fetch terms from the smaller layers.

Langchain has an ensemble retriever and a merger retriever specifically to address this issue: It allows us to create multiple collections and fetch a predefined number of items from each collection based on a single query.

Objective: support multi-collecthion chroma via ensemble or merger retriever.

Requirements:

[ ] Update chroma_build flow to take multiple input files (?) and create 1 chroma collection per input file
[ ] Update chroma config to take a list of collection names, instead of a single one
- optionally a weight / top k associated with each collection
[ ] Update generation functions to use langchain's ensemble/merger retriever

sdsc-ordes / kg-llm-interface

allow multiple chroma collections #20