snap-stanford / stark

STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases (https://stark.stanford.edu/)
https://stark.stanford.edu/
MIT License
270 stars 33 forks source link
graph information-retrieval knowledge-base llm multimodal nlp semi-structured-data

STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases

[![](https://img.shields.io/badge/website-STaRK-purple?style=plastic&logo=Google%20chrome)](https://stark.stanford.edu/) [![](https://img.shields.io/badge/Dataset-yellow?style=plastic&logo=Hugging%20face)](https://huggingface.co/datasets/snap-stanford/stark) [![](https://img.shields.io/badge/SKB_Explorer-online-yellow?style=plastic&logo=Hugging%20face)](https://stark.stanford.edu/skb_explorer.html) [![](https://img.shields.io/badge/Arxiv-paper-red?style=plastic&logo=arxiv)](https://arxiv.org/abs/2404.13207) [![](https://img.shields.io/badge/pip-stark--qa-brightgreen?style=plastic&logo=Python)](https://pypi.org/project/stark-qa/) [![](https://img.shields.io/badge/doc-online-blue?style=plastic&logo=Read%20the%20Docs)](https://stark.stanford.edu/docs/index.html) [![](https://img.shields.io/badge/-Linkedin-blue?style=plastic&logo=Linkedin)](https://www.linkedin.com/posts/leskovec_reduce-llm-hallucinations-with-rag-over-textual-activity-7190745116339302401-da4n?utm_source=share&utm_medium=member_desktop) [![](https://img.shields.io/badge/-Twitter-cyan?style=plastic&logo=X)](https://twitter.com/ShirleyYXWu/status/1784970920383402433) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

NEWS

What is STaRK?

STaRK is a large-scale Semi-structured Retrieval Benchmark on Textual and Relational Knowledge bases, covering applications in product search, academic paper search, and biomedicine inquiries.

Featuring diverse, natural-sounding, and practical queries that require context-specific reasoning, STaRK sets a new standard for assessing real-world retrieval systems driven by LLMs and presents significant challenges for future research.

🔥 Check out our website for more overview!

Access benchmark data

1) Env Setup

From pip (recommended)

With python >=3.8 and <3.12

pip install stark-qa

From source

Create a conda env with python >=3.8 and <3.12 and install required packages in requirements.txt.

conda create -n stark python=3.11
conda activate stark
pip install -r requirements.txt

2) Data loading

from stark_qa import load_qa, load_skb

dataset_name = 'amazon'

# Load the retrieval dataset
qa_dataset = load_qa(dataset_name)
idx_split = qa_dataset.get_idx_split()

# Load the semi-structured knowledge base
skb = load_skb(dataset_name, download_processed=True, root=None)

The root argument for load_skb specifies the location to store SKB data. With default value None, the data will be stored in huggingface cache.

Data of the Retrieval Task

Question answer pairs for the retrieval task will be automatically downloaded in data/{dataset}/stark_qa by default. We provided official split in data/{dataset}/split.

Data of the Knowledge Bases

There are two ways to load the knowledge base data:

3) Evaluation on benchmark

Reference

Please consider citing our paper if you use our benchmark or code in your work:

@article{wu24stark,
    title        = {STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases},
    author       = {
        Shirley Wu and Shiyu Zhao and 
        Michihiro Yasunaga and Kexin Huang and 
        Kaidi Cao and Qian Huang and 
        Vassilis N. Ioannidis and Karthik Subbian and 
        James Zou and Jure Leskovec
    },
    eprinttype   = {arXiv},
    eprint       = {2404.13207},
  year           = {2024}
}