The LangChain implementation functions correctly when run on a single file at a time.
However, when set up to run over multiple files in a loop in a single script, the Chroma vector database that stores the transcript values does appear to reset itself correctly and instead appends to the database each time instead of creating a new database for each transcript. This results in questions generated referring to incorrect sources and being for the wrong transcript.
The fix for this is simple enough, by assigning a unique database name to be used for each transcript so that they are all unique and separate.
The LangChain implementation functions correctly when run on a single file at a time. However, when set up to run over multiple files in a loop in a single script, the Chroma vector database that stores the transcript values does appear to reset itself correctly and instead appends to the database each time instead of creating a new database for each transcript. This results in questions generated referring to incorrect sources and being for the wrong transcript.
The fix for this is simple enough, by assigning a unique database name to be used for each transcript so that they are all unique and separate.