potsawee / selfcheckgpt

SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
MIT License
467 stars 54 forks source link

Can you provide the name list that is used for creating the dataset. #9

Closed soap117 closed 1 year ago

soap117 commented 1 year ago

Without the names that are used, it is hard to carry out the experiment on other models. "This is a Wikipedia passage about {concept}". For names, I mean the {concept} here. I can't find it in the dataset.

potsawee commented 1 year ago

Hi @soap117,

Sorry for my late reply -- I only saw your message just now.

There are two options for you: 1) As these concepts are just names of people in the wikibio dataset, you can simply take the name from the text in my dataset, e.g. "John Russell Reynolds" in the first item.

2) Alternatively, you could use "wiki_bio_test_idx", go to the wiki_bio dataset (testset) and find the corresponding rows (https://huggingface.co/datasets/wiki_bio) -- I extracted the names from input's "context".

Best, Potsawee