potsawee / selfcheckgpt

SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
MIT License
467 stars 54 forks source link

How long would it usually take to run all three score? #12

Closed yihan-zhou closed 1 year ago

yihan-zhou commented 1 year ago

Hi there, i tried to run the example of SelfCheckGPT Usage: BERTScore, QA, n-gram on Readme and it took 24mins. Is it expected?

potsawee commented 1 year ago

Hi @yihan-zhou,

The time for selfcheck QA & NLI methods depends on whether you run them on GPU or not. I just tried timing the example in README, and the results are (the same example as shown in README measured using %%time in Jupyter notebook):

  1. SelfCheck-QA: 18.7s (on GPU), or 3m (on CPU)
  2. SelfCheck-BERTScore: 12.6s (the implementation doesn't use GPU)
  3. SelfCheck-Ngram: 513ms (the implementation doesn't use GPU)
  4. SelfCheck-NLI: 225ms (on GPU), or 1.6s (on CPU)

My system has one V100 GPU and 32-CPU core (3.20GHz). The fact that it takes 24 mins for you might be that: if you use CPU, you might want to run on multi cores (e.g. export OMP_NUM_THREADS=NUM_OF_CORES) or running on GPU by setting:

device = torch.device("cuda")
selfcheck_mqag = SelfCheckMQAG(device=device)
selfcheck_nli = SelfCheckNLI(device=device)
yihan-zhou commented 1 year ago

Thank you for the fast follow and for sharing the info. I will give it a try!