yizhenpong / Sustainable-Finance-Company-Wiki-page-with-Langchain

0 stars 0 forks source link

Sustainable-Finance-Company-Wiki-page-with-Langchain

Abstract

Large language models (LLMs) play a pivotal role in various Natural Language Processing (NLP) tasks. This study focuses on the application of LLMs in the realm of sustainable finance, specifically framing the task as closed-domain question answering. Leveraging on Retrieval-Augmented Generation (RAG) with top-k retrieval, this paper introduces an innovative approach that combines RAG with insights from Table of Contents (ToC), denoted as RAG + ToC. This method effectively addresses the problem of long inputs, under the assumption of structured data as inputs where ToC acts as an optimised filter for handling our query. In handling extensive outputs, we employ a conventional method of generating an outline, but tailored using RAG + ToC with chain of thought prompting. A new phenomenon of Structure bias of LLMs was also introduced in our analysis of outputs.

Approach

LLM model = Mistral 7b, LLM framework = langchain, OllamaEmbeddings, Chroma vector store, LLM hosted on Ollama

Part 1 - Data collection

Collect data relevant for sustainable finance

Part 2 - Content generation

Structure of the output (Sustainable Finance Wikipedia pages):

Pipeline for content generation: --- see wiki_gen_base.py + wiki_gen_ToC.py

Part 3 - Evaluation