samiulhq / LLM-RAG-SAS

Upstream Repo: Framework for implementing Retrieval Augmented Generation (RAG) from SAS platform
Apache License 2.0
3 stars 2 forks source link

Provide CAS table as input dataset #40

Open SundareshSankaran opened 2 months ago

SundareshSankaran commented 2 months ago

Now that pandas data frame is a supported input data source, we will extend functionality further to include an option to specify either a SAS dataset or a CAS table as a data source.

Why? Given the main objective of providing tools which make it easier to integrate LLMs into the Viya ecosystem, this is a natural direction to go towards. Data in the Viya ecosystem resides in SAS datasets, CAS tables or sources from databases and external data sources, manifested either as a SAS dataset or CAS table.

How?

An input data source will be added to the custom step The engine of the input dataset identified (SAS or CAS) to determine best processing mechanism Code will convert input dataset to pandas data frame Pandas data frame will be stored to Chroma vector store taking advantage of existing langchain loader Rest of the process will run as normal