Closed EngSalem closed 1 year ago
Hi @EngSalem,
Sorry for my later reply -- I only saw this just now.
So, you could run selfcheck-BERTScore on your development dataset, and choose an optimal threshold.
Note that someone pointed out to me that the current BERTScore is not properly scaled; therefore, the values are extremely high (or low in our case) see this issue here: https://github.com/Tiiiger/bert_score/blob/master/journal/rescale_baseline.md
I will edit the code soon to have rescale_with_baseline
as an option for BERTScore soon, but you are welcome to add it to yourself.
Best, Potsawee
rescale_with_baseline
option has been added to SelfCheck-BERTScore
Hello, I am trying to use your work to estimate the factuality of samples. I am just getting relatively low scores for the selfcheck_bertscore even when the samples are totally contradicting. I was wondering how did you choose if a passage is factual or nonfactual.
Thank you