sabszh / EER-chatbot-UI

Repo for chatbot made for the EER group
https://EERChat.ploomberapp.io
4 stars 0 forks source link

How do RAGs / LLMs understand time? #7

Open lightnin opened 1 month ago

lightnin commented 1 month ago

It seems like an important aspect of what we're interested in has to do with time.

sabszh commented 1 month ago

Idea to test out:

lightnin commented 1 month ago

Here's a relevant paper mentioning a benchmark for understanding time: TRAM: BENCHMARKING TEMPORAL REASONING FOR LARGE LANGUAGE MODELS Yuqing Wang Stanford University ywang216@stanford.edu Yun Zhao Meta Platforms, Inc. yunzhao20@meta.com ABSTRACT Reasoning about time is essential for understanding the nuances of events de- scribed in natural language. Previous research on this topic has been limited in scope, characterized by a lack of standardized benchmarks that would allow for consistent evaluations across different studies. In this paper, we introduce TRAM, a temporal reasoning benchmark composed of ten datasets, encompassing vari- ous temporal aspects of events such as order, arithmetic, frequency, and duration, designed to facilitate a comprehensive evaluation of the temporal reasoning ca- pabilities of large language models (LLMs). We conduct an extensive evaluation using popular LLMs, such as GPT-4 and Llama2, in both zero-shot and few-shot learning scenarios. Additionally, we employ BERT-based models to establish the baseline evaluations. Our findings indicate that these models still trail human per- formance in temporal reasoning tasks. It is our aspiration that TRAM will spur further progress in enhancing the temporal reasoning abilities of LLMs. Our data is available at https://github.com/EternityYW/TRAM-Benchmar

https://arxiv.org/pdf/2310.00835

lightnin commented 1 month ago

This paper seems to suggest that LLMs sort of suck at temporal reasoning right now.

Large Language Models Can Learn Temporal Reasoning

While large language models (LLMs) have demonstrated remarkable reasoning capabilities, they are not without their flaws and inaccuracies. Recent studies have introduced various methods to mitigate these limitations. Temporal reasoning (TR), in particular, presents a significant challenge for LLMs due to its reliance on diverse temporal expressions and intricate temporal logic. In this paper, we propose TG-LLM, a novel framework towards language-based TR. Instead of reasoning over the original context, we adopt a latent representation, temporal graph (TG) that facilitates the TR learning. A synthetic dataset (TGQA), which is fully controllable and requires minimal supervision, is constructed for fine-tuning LLMs on this text-to-TG translation task. We confirmed in experiments that the capability of TG translation learned on our dataset can be transferred to other TR tasks and benchmarks. On top of that, we teach LLM to perform deliberate reasoning over the TGs via Chain of Thought (CoT) bootstrapping and graph data augmentation. We observed that those strategies, which maintain a balance between usefulness and diversity, bring more reliable CoTs and final results than the vanilla CoT distillation.

https://arxiv.org/abs/2401.06853