How long would it usually take to run all three score?

yihan-zhou commented 1 year ago

Hi there, i tried to run the example of SelfCheckGPT Usage: BERTScore, QA, n-gram on Readme and it took 24mins. Is it expected?

potsawee commented 1 year ago

Hi @yihan-zhou,

The time for selfcheck QA & NLI methods depends on whether you run them on GPU or not. I just tried timing the example in README, and the results are (the same example as shown in README measured using %%time in Jupyter notebook):

SelfCheck-QA: 18.7s (on GPU), or 3m (on CPU)
SelfCheck-BERTScore: 12.6s (the implementation doesn't use GPU)
SelfCheck-Ngram: 513ms (the implementation doesn't use GPU)
SelfCheck-NLI: 225ms (on GPU), or 1.6s (on CPU)

My system has one V100 GPU and 32-CPU core (3.20GHz). The fact that it takes 24 mins for you might be that: if you use CPU, you might want to run on multi cores (e.g. export OMP_NUM_THREADS=NUM_OF_CORES) or running on GPU by setting:

device = torch.device("cuda")
selfcheck_mqag = SelfCheckMQAG(device=device)
selfcheck_nli = SelfCheckNLI(device=device)

yihan-zhou commented 1 year ago

Thank you for the fast follow and for sharing the info. I will give it a try!

potsawee / selfcheckgpt

How long would it usually take to run all three score? #12