Large language models (LLMs) play a pivotal role in various Natural Language Processing (NLP) tasks. This study focuses on the application of LLMs in the realm of sustainable finance, specifically framing the task as closed-domain question answering. Leveraging on Retrieval-Augmented Generation (RAG) with top-k retrieval, this paper introduces an innovative approach that combines RAG with insights from Table of Contents (ToC), denoted as RAG + ToC. This method effectively addresses the problem of long inputs, under the assumption of structured data as inputs where ToC acts as an optimised filter for handling our query. In handling extensive outputs, we employ a conventional method of generating an outline, but tailored using RAG + ToC with chain of thought prompting. A new phenomenon of Structure bias of LLMs was also introduced in our analysis of outputs.
LLM model = Mistral 7b, LLM framework = langchain, OllamaEmbeddings, Chroma vector store, LLM hosted on Ollama
Collect data relevant for sustainable finance
nasdaq_screener.csv
filecompany_info.csv
file: contains all selected nasdaq stock codes, fill in the column on sustainability report (SR) linkdata_helper.py
file downloads all SR in company_info.csv
, creates output folders and corresponding stock code empty filesStructure of the output (Sustainable Finance Wikipedia pages):
Pipeline for content generation: --- see wiki_gen_base.py
+ wiki_gen_ToC.py
Part A
Part Ci - outline of ESG approaches
Part Cii - for each point in outline (Ci), generate 3-5 paragraphs
Part B - generate 1-2 paragraphs
^ At different points of the model, different temperature llms are utilised, please refer to the system prompts and llm
^ Understanding of the concept "materiality/material topics/material information" may be required for holistic understanding.
eval/section_A_ground_truth.json
- human generatedeval/section_A_eval.json
:
for each field in section A
eval: 0 or 1 (binary)
(1 means it is factually correct)
type: absolute or relative
(absolute means that the information is usually static or any random two people woudl come up with the same answer. relative means that information is debatable, example, important people - who is to determine the extent of importance)
reason: NA or any text
(NA for eval = 1/type = absolute)
eval_section_A.py
to get the scores for part A output
eval/section_A_eval.json
for the human evaluation conducted based on
eval/section_A_ground_truth.json
that was retrieved via google searcheval_section_B_C.py
for eval of part B and C. Outputs placed in eval/all_companies.txt